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PREFACE 


‘Unified Principles of Statistics’ is mostly regarded 
as a difficult subject but we are sure that this book is 
going to be of immense help to the students as 
beyond comprehensive because it has been written 
in simple language and its most systematic order 
explaining each and every point at length. Some of 
the special features of this book are : F 


This book has been prepared for B.Com. students as 
per the prescribed latest syllabus. 


F 


Several illustrations and practical questions has been 
selected from various universities examination. 


F 


Hints will be found for difficult problems to enable 
students to work hard to complete problems. 


F 


Selection of illustrations and practical problems in 
each chapter has been placed in simple to complex 
form. It will be helpful to the weaker students. 


F 
A number of questions have been classified into 
various segments, as objective type questions and 


short answer type questions. 


Our thanks are due to all those who took care to 
point out shortcomings or an error in this book. 


We are grateful to, Ram Prasad Publications who 
has given valuable suggestions to improve the book. 


We are sure that this book will serve the purpose of 
those persons who are interested in this subject, 


valuable suggestions are invited from the learned 
teachers and readers. 


—Authors 
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STATISTICS 


— ° Meaning, Definitions, Scope, Nature, Functions and 


Importance of 
Statistics 


— * Limitations of Statistics 


ORIGIN OF STATISTICS 


The word ‘Statistics’ has been used in our daily life for so many 
years. In the early years kings and Emperors used to get the 
informations about population and economics for smooth running of 
administrations and political activities. So it was considered as ‘the 
science of kings’ in the beginning. 

It seems the English word ‘Statistics’ has been derived from the 
latin word ‘Status’ or the Italian word ‘Statista’. Both mean a political 
state. 

For the first time Baron J.F. Von Bielfield wrote in his book entitled 
‘the Element of universal Erudition’ that Statistics is the science that 
teaches us about the political arrangement of all the modem states 
of the known world one can get the same type of informations from 
Kautilya’s Arthashastra. In the ancient India the data of births and 
deaths were collected in Maurya kingdom. Same _ type of 
informations may be also seen in the old history of some other 
countries. Statistics, thus, was considered as a branch of economics 
in our old time. 

The German Scientist Gottfried Acknewall introduced statistics first 
time in its present scientific form. Therefore he is known as ‘father of 
Statistics’. The contribution of statisticians like Francis Galton, Dr. 
A.L. Bowley, Karl Pearson, William S. Gosset, R.A. Fisher, Professor 
Yule and Kendal and P.C. Mahelonobis are of great importance in 
the development of statistics. 


DEFINITIONS OF STATISTICS 


The English word statistics is used in the three senses : 


(1) Stastistics as the numerical figures or data. (2) Statistics as a 
science (as a subject) 

(3) Statistics as measures based on samples. 

The meanings first and second are related with the general 
purpose of statistics while the third meaning is related with research. 
The dictionary meaning of statistics is that it referes in the singular to 
the subject as a whole and in plural to the numerical data. 

Here the definition of statistics is being given in the form of science 
of statistics (as a subject). 


Statistics as a Subject : Statistics as a subject means Statistical 
methods. The reason behind it is that we can draw inferences of this 
analysing and explaining the data by statistical methods. There have 
been many definitions of the term’ statistics’. We can divide them in 
two categories : 

(1) Narrow Definitions (2) Extensive Definitions 

(1) Narrow definition : The definition of this type given by different 
authors are mostly incomplete they emphasised only on some of the 


aspects of statistics. 

Prof. A.L. Bowley has given three definitions : 

(i) “Statistics is the science of counting”. 

(ii) “Statistics may rightly be called the science of averages’. 

(iii) “Statistics is the science of the measurement of social 

organism, regarded as a whole in all its manifestations. 

All the above definitions given by Bowley are incomplete. The first 
definition is too narrow. It covers only one aspect of statistical 
methods namely collection of data. Other aspects like analysis, 
tabulation, representation and interpretation have been completely 
ignored. In the second definition emphasis is given only to one 
device (average) used in statistical methods. The other devices like 
dispersion, coefficient of skewness, correlation coefficient are nor 
covered. 

The comparison cannot be done only on the basis of averages. In 
third definition Bowley has confined the scope of statistics only to 
man and his social activities. The use of statistics cannot be 


confmed to only sociology. It can be used in natural science. 
Besides, only one device of statistics is measuring is used in this 
definition. Other devices are not mentioned here. 

According to Boddington, “Statistics is the science of estimates 
and probabilities’. Here Boddington has empharised only two 
devices of Statistical methods estimate and probability. He has 
ignored the other devices. Therefore, this definition is also 
incomplete. 


(2) Extensive definition : The definitions of statistics of this 
category cover all the devices of statistical methods. Some important 


definition out of them are as follows : 

According to Seligman, “Statistics is the science which deals with 
the method of collecting, presenting, comparising and interpreting 
numerical data collected to throw some. light on any sphere of 
enquiry’. 

According to Croxton and Cowden, “Statistics may be defined as 
the collection, presentation, analysis and interpretation of numerical 
data”. According to Prof. Lovitt, “Statistics is the science which deals 
with the collection, classification and tabulation of numerical facts as 
a basis for explanation, description and comparison of phenomena.” 

In the words of Taro Yamne, “The theory and methods of collecting, 
tabulating and analyzing numerical data comprise the study of 
statistics as the subject’. The aforesaid definitions cover the 
statistical devices like collection, representation, analysis, 
interpretation, etc. of data. These device are used to draw the 
inferences after analysing the data relating to any event. Therefore, 
there definitions are appropriate to some extent. 


MODERN DEFINITIONS OF STATISTICS 


The definitions given by Seligman, Croxton and Cowden points out 
descriptive character of Statistics while modern definitions points out 
analytical and decisive characters of statistics. How statistics is 
considered on science and to draw inferences about unknown 
events. There are the following modern definitions of statistics : 

According to Wallis and Roberts, “Statistics is a body of methods 
for making wise decisions in the face of uncertainty.” 


According to Kenney and Keeping, “Statistics has usually meant 
the science and art concerned with the collection, presentation and 
analysis of Qualitative data so that intelligent judgement may be 
formed upon them. 

According to Spiegal, “Statistics is concerned with scientific 
methods for collecting, organising, Summarizing, presenting and 
analysing data, as well as drawing valid conclusions and making 
reasonable decisions on the basis of such analysis.” 

Thus it is clear that numerical data are collected and analysed on 
the science of statistics. Then after valid and reasonable decisions 
are taken on the basis of there analysis by scientific statistical 
methods. The researchers have been pointing out this aspect of 
statistics since two decades in their studies. 

From the above definitions of statistics it is clear that the authors 
have different opinions with respect to the definition of statistics like 
Economics yet the following facts should be covered in an 
appropriate definition of statistics : 

(1) Statistics is both science and art. 

(2) Such collective facts are studied in the statistics which can be 
presented numerically and affected by various causes. 

(3) The statistical methods viz . collection, tabulation, presentation, 
analysis, interpretation of data are used to draw conclusion about 
any fact. 

(4) Scope of statistics is ‘very wide. Statistical devices are used in 
every science. 

Keeping in mind the above facts an appropriate definition of 
statistics will be—‘Statistics is a science and an art which studies the 
collective numerical data related in any sphere of inquiry and 
affected by multiple causes. Important inferences are drawn by 
collecting, presenting, analysing and interpreting the data.” 


SCOPE OF STATISTICS 


Statistics was used for limited works in old time but nowadays its 
scope become very wide. Statistical devices are being used as an 
important tools in the every sphere of science. Statistics may be 


grouped into two parts on the basis of scope : (a) Statistical 
methods, (b) Applied statistics. 
(a) Statistical Methods 

Statistical methods are used to draw the important conclusions by 
arranging raw data systematically. According to Johnson and 
Jackson, “Statistical methods are the procedures used in the 
collection, organization, Summary, analysis, interpretation and 
presentation of data.” 

According to Yule and Kendell, “By statistical methods we mean 
methods specially adopted to the elucidation of quantitative data 
affected by a multiplicity of causes.” 

Statistical methods are equivalent to the production process. As 
we obtain produced material from the raw material through various 
manufacturing process in the factory, we may obtain reasonable 
conclusions from the collected data in the same way through the 
devices of collection, classification, tabulation, analysis, 
interpretation etc. All these processes make the data easily 
understandable. Therefore all the processes between collection of 
data and conclusion are known as statistical methods following are 
the important statistical methods : 

(i) Collection of data : Collection of data is the first step of 
statistical investigation. But it needs better planning. Planning should 
be done on the basis of aims investigation there are two types of 
data : 

(a) Primary data (b) Secondary data. 

Data originally collected in any investigation are known as primary 
data. While the data collected by other persons are called secondary 
data. The difference between primary and secondary data is a 
matter of relativity. Data which are primary in the hands of one may 
become secondary in the hands of another. The data are primary for 
the individual institution or agency collecting them while for the rest 
of the world they are secondary. 

Any research work is performed on the basis of collected data. 
Hence it is necessary that the data should be collected carefully. 
These data should be non defective and error free. 


(ii) Classification of data : Collected data are big, unsystematic 
and complicated. It will be very hard work to draw any conclusion 
from them. Therefore, collected data should be condensed. They are 
Classified on the basis of some characteristics and attributes. 

(iii) Tabulation of data : After classification the next essential step 
is tabulation. The data are put in a table having different rows and 
columns. Tabulation helps to get the answer of every type of enquiry. 

(iv) Presentation of data : After collection, classification and 
tabulation the data are ready for presentation. There are various 
ways in which statistical data may be presented e.g. , statistical 
tables, diagrams and graph. 

(v) Analysis of data : Various statistical measures are calculated 
to analyse the presented data e.g. , measures of central tendency, 
dispersion; coefficient of skewness, correlation, regression, time 
series, index number etc. 

(vi) Interpretation : After analysing the data the interpretation is 
the last and important stage of any study. Interpretation means 
drawing conclusion from the data collected and analysed. It is a 
difficult task and needs a high degree of skill and experience correct 


interpretation will lead a valid conclusion of the study. 
(b) Applied Statistics 

Statistical methods give the knowledge about the principles and 
rules. Applied statistics deals with application of these rules and 
procedures. Here we decide which statistical methods will be used 
for the problem under investigation. The data related to Economics, 
Commerce, Business administration, Sociology, Psychology, 
Biosciences etc. are used under applied statistics. Applied statistics 
may be divided into two parts descriptive and scientific. 


(i) Descriptive Applied Statistics : It deals with known data or 
records relating to the past or present Business statistics, health 


statistics, industries statistics etc. come under it. 

(ii) Scientific Applied Statistics : It deals with the formulation of 
scientific laws for the investigation of data collected for descriptive 
purposes according to rules and procedures of statistical method, for 
example an economist want to use scientific applied statistics to test 
the demand law. 


NATURE OF STATISTICS 


Nature of statistics may be both a science and an art. It can be 
described as follows : 


Statistics of science : Science is a systematised body of 
knowledge. It analyses the causes and effect to represent the 
relationship between them. Any branch of knowledge will be a 
science if it requisites the following characteristics : (i) It should be 
systematic body of knowledge. 

(ii) Its rules and procedures should be universal. (iii) It should clear 
the relationship between cause and effect. (iv) It should have ability 
of forecasting. 

Statistics consists of all the above characteristics. Statistics deals 
with systematic study of numerical facts (data). Inertia of large 
numbers, theory of probability etc. are the universal law in statistics. 
Statistics deals with the analysis and interpretation of the data which 
is equivalent to establish the relationship between cause and effect. 
We can forecast the future by the statistics on the basis of present 
and past data e.g. , extrapolation, regression and analysis of time 


series. 

All the above characteristics of the science are present in the 
statistics. Hence statistics is Known as science. According to some 
scientists, statistics as a science is not similar to exact sciences like 
Botany, Physics, Zoology, Chemistry etc. Actually statistics is a 


scientific method which cooperates other sciences to get conclusion. 
Statistics is used in all natural sciences. According to Croxton and 
Cowden, “Statistics is not a science, it is a scientific method.” 


Statistics as an art : The word art means an action. Looking from 
this angle, statistics may be also regarded as an art. The word art 
means to perform the action properly and effectively to obtain 
required aim. Any subject will be known as an art if it requisites the 
following characteristics : (i) Art is the body of those actions by which 
definite results are obtained. (ii) Art tells the devices to obtain aim 
and merits-demerits of aim. (iii) Special skill, experience and self 
control are required in the devotion of art. 


Above characteristics are also present in the statistics. Hence 
statistics is an art. Statistics presents the means and devices to 
collect, represent, analyse and interpret the data for the solution of 
various statistical problems. To use statistical methods the 
statistician should be skilled, experienced and self controlled like an 
artist. It will avoid from any fallcious and biased conclusions. 


Hence we can say that the statistics is both a science and an art. 
According to Tippett, “It (statistics) is both a science and an art. Itis a 
science in that its methods are basically systematic and have 
general application and an art in that their successful application 
depends to a considerable degree on the skill and special 
experience of the statistician and on his knowledge of the field of 
application, e.g. , Economics.” 


FUNCTIONS OF STATISTICS 


The applications of statistics are increasing day by day in all 
branches of science and knowledge. The statistical methods are 
used in every sphere of research. The following are main functions 
of statistics. 


(1) To present facts in a definite form : The most important 
functions of statistics is to present general statement in a precise 


and definite form. The conclusions stated numerically are definite 
and hence more convincing than conclusions stated qualitatively. 
This fact can be readily understood by a simple example. In an 
advertisement, statements expressed numerically have greater 
attraction and more appealing than those expressed in a qualitative 
manner. The statement, “We have sold more computers this year’, is 
certainly less attractive than “Record sale of 1,00,000 computers in 
2006 as compared to 60,000 in 2005.” The latter statement 
emphasises in a much better manner the growing popularity of the 
advertiser's computers. 

(2) To simplify complex statistical data and to make them 
understandable : Generally collected data are unsystematic. They 
will not help in any way for understanding their meaning or 
underlying trends. These complex data may be reduced to totals, 
averages, percentages etc. and presented either diagrammatically or 
graphically. These devices help us to understand the characteristics 
and meanings of the data. Single figures in the form of averages can 
be grasped more easily than a mass of data comprising of 
thousands of facts. Similarly diagrams and graphs give a bird’s-eye 
view of the entire data and, therefore, information presented is easily 
understood. 

(3) To compare simplified data : Statistics deals with 
comparative study of facts related to any problem. Certain facts, by 
themselves, may be meaningless unless they are capable of being 
compared with similar facts at other places or at other periods of 
time. Some of the modes of comparison provided by statistics are : 
average, dispersion, correlation etc. and coefficient. For example, if 
we want to get the comparative study of marks obtain by two 


students, we may calculate average of their marks. Thus statistics 
affords suitable technique for comparison. According to Bowley, “A 
chief practical use of statistics is to show relative importance, the 
very thing which an individual is likely to misjudge. Statistics are 
almost always comparative.” 

(4) To help in the formulation of policies : Statistics plays an 
important role in the formulation of economic and social policies. We 
can make proper policies by using the analysis of data. We can also 
make family budget and policies regarding the standard of living of 
the people with the help of it. Marshall has rightly said, “Statistics are 
the straw out of which |, like every other economist have to make the 
bricks.” 

(5) To forecast for the future : Statistical methods provide helpful 
means of forecasting future events. We can make plans and policies 
well in advance of the time of their implementation. A knowledge of 
future tendencies is very much helpful in framing suitable policies 
and plans. For example, if a statistician forecast that the population 
of his country will become 100 crore by 2050, then his government 
will have to make the plan to control the population and increase the 
production. Dr. Bowley has beautifully summed up the importance of 
statistical estimates as, “A statistical estimate may be good or bad, 
accurate or the reverse; but in almost all cases it is likely to be more 
accurate than a casual observers impression, and in the nature of 
things can only be disproved by statistical methods.” 

(6) To help in the formulation and testing of hypothesis : 
Statistical methods are extremely helpful in the formulation and 
testing of hypothesis and to develop new theories for social 
sciences. For example, hypothesis like whether chloromycetin is 


effective or not in checking typhoid, whether a particular corn is 
unbiased or not, whether the schedule caste students have benefited 
by the extra coaching etc. can be formulated and tested by 
appropriate statistical tools. 

(7) To enlarge individual experience and knowledge : Like 
other sciences statistics also helps in enlarging individual experience 
and knowledge. Statistical methods are extremely useful to enhance 
the logical power of the person. One can give an appropriate solution 
of any problem with the help of statistics. Without statistics as 
knowledge is incomplete and in efficient statistics enable one to 
enlarge his horizon. One can get various types of experience 
through statistical investigation. Statistics can critically analysis old 
theories and laws and produce new rules. According to Dr. Bowley, 
“The proper function of statistics indeed, is to enlarge individual 
experience’. 


IMPORTANCE OF STATISTICS 


Statistical methods have become useful tools in the world of 
affairs. Since ancient times the ruling kings and chiefs have relied 
heavily on statistics in framing suitable military and fiscal policies. In 
the recent years the importance of the statistics have increased 
tremendously. 

Statistics is now indispensable in all subject-matter. There is hardly 
any field whether it be trade, industry, commerce, economics, 
biology, astronomy, physics, chemistry, education, medicines, 
sociology, psychology or meteorology where statistical tools are not 
applicable. That is why it is said, “Statistics is what statisticians do.” 

According to Wallis and Roberts, “Statistics is a tool which can be 
used in attacking problems that arise in almost every field of 
empirical enquiry.” Let us now examine the following few importance 
of statistics. 


(1) Importance of Statistics in Administration : Statistics 
originated to conduct the administrations and rules properly. In old 
days the king or emperor used to collect the data regarding 
population, food production, taxes, crimes, military strength to rule 
systematically and make exchequer and military policies. The 
concept of a state has changed from that of simply maintaining law 
and order to that of a welfare state. Statistics are extremely helpful in 
promoting human welfare like education, health transport and 
telecom means. Hence statistics are the eyes of government of 
administration. 

The income expenditure can be estimated for next year on the 
basis of income expenditure’s data of current year. What is the 
country’s population ? What is the growth rate ? What is there food 
production ? What will be food production for the next year ? What is 
the direction of 
trade ? What is condition of export and import trade ? What is the tax 
policy of government ? What is the tendency of price ? Is price 
increasing or decreasing ? The government makes policy after 
collecting and analysing the data relate to these events. Statistics 
are analysed by statistical methods to evaluate the progress in the 
financial matter. Thus, it is clear that statistics has significant role in 
administrative matter. 


(2) Importance of Statistics in Planning : Modern time is the 
time of planning. All the countries developed or developing are trying 
to solve their economic problems by the planning. Planning is the 
means to maintain the level of economic development for developed 
countries while planning is the mean to achieve the aim of economic 


development for developing countries. 

Planning is not possible in the absence of sufficient and reliable 
data. The planners can get the informations about natural resources, 
capital technical knowledge, national income, savings etc. through 
statistical data only. They can also get the informations about short 


term and long term requirements of the country by statistics. They 
can decide the priorities for the country on the basis of statistics. 
They can also set targets for financial matters of different sectors on 
the basis of data. They estimate required financial resources to get 
these targets. Thus, it is clear that statistics or data are the 
foundation of the plannings. Not only this but we can evaluate also 
the outcomes of our planning with the help of data by statistical 
methods. On the basis of data we can determine in which sector we 
have achieved the targets and in which sector we have not achieved 
the targets. What are the reasons behind flow development ? What 
are the devices to remove them ? All these problems can be solved 
by analysing the data. Hence Tippett has correctly written, “Planning 
is the order of the day and without statistics planning is 
inconceivable.” It is also said, “Planning without statistics is a ship 
without radder and compass.” As we doubt and fear to sail a ship to 
its destination without radder and compass, in the same we can't 
achieve various targets of financial development by planning without 
reliable and adequate statistics. The work of planning without 
statistics is the same as the search a black cat in dark night that is 
not there. Explaining the importance of statistics Indian Planning 
Commission has cleared, “Planning on the basis of inadequate and 
in accurate statistics is worse than no planning at all.” The famous 
Indian statistician Prof. P.C. Mahalanobis had set a target and 
structure of second five year plan of India but it failed due to non- 
availability of accurate and reliable statistics. Hence availability of 
sufficient, accurate and reliable statistics and the knowledge of 
scientific statistical methods are necessary for the planning. 


(3) Importance of Statistics in Economics : Statistics is 
extremely useful in the field of Economics. Explaining the 
dependance of Economics on statistical data the famous economist 
Prof. A Marshall said, “Statistics are the straw out of which | like 
every other economist, have to make the bricks.” Economics is the 
subject that deals with the consumption, production, investment and 
distribution of wealth. Statistical data and statistical methods are of 
immense help in the proper understanding of the economic problems 


and in the formulation of economic policies. For example, what to 
produce, how to produce, for whom to produce and how much to 
produce—all these are questions that need a lot of statistical data 
the producer should have reliable statistics of production for 
adjusting the supply according to the demand. Statistics of 
consumption tells us to find out the way in which people of different 
classes of society spend their income. Analysis of such data is very 
useful for Knowing the standard of living and taxable capacity of the 
people. Statistical data and graphs are useful for analysing and 
explaining marginal consumption loss law, even marginal 
consumption law, demand law etc. in the field of consumption. Thus 
statistics are also very useful in the field of consumption. In the field 
of exchange we study markets, law of price based on supply and 
demand, production, cost of production, banking and credit 
instruments, etc. What will be the price of a particular commodity if 
its supply increases or decreases in the market, what price should a 
monopolist charge in order to reap maximum profits—these are the 
questions that can be answered by analysing statistical data. In fact 
statistics are the foundation-stone of the exchange theory. In 
distribution too statistics are of great importance. We can’t calculate 
national income and study the problems related to its distribution 
without statistics. We have to rely heavily on statistics to understand 
and solve the problems of rising prices, rising population, 
unemployment, poverty and reducing disperities in the distribution of 
income and wealth. Evaluation of governments tax policy, budget, 
public expenditure, public debt., etc. are expressed with the help of 


statistics. 
Statistical methods help not only in formulating economic rules and 
policies but also in evaluating their effects. Apart from economic 


policy, the development of economic theory has also been facilitated 
by the use of statistics. The complexity of modern economic 
organisation has rendered deductive reasoning inadequate and 
difficult. Statistics is now being used increasingly not only to develop 
new economic concepts but also to test the old ones. 

Wide applications of statistical methods in the theory of economics 
have been to the development of new _ disciplines called 
Econometrics. In this subject we test the validity of economic 
theories and rules by mathematical and _ statistical methods. 
Economic rules are expressed in the form of mathematical models 
relevant data are collected to test these models. 


(4) Importance of Statistics in Business and Commerce : 
Statistical knowledge is essential to get the success in the field of 
business, industry and commerce. How much is to be produced ? 
How much is demand ? Where is demand ? Will prices increase or 
decrease ? What is the condition of the supply ? What is the 


government policy ? How many workers will be needed? 

A shrewed manufacturer will want to know the answers of all these 
questions in well advance the statistical analysis will be very useful 
here. Statistics is thus an useful tool for the successful businessman. 

There is an important place of estimates and probabilities in the 
field of business and commerce. The reason behind it is that 
businessman takes the decision on the basis of these estimates and 
probabilities. The success in business depends upon the correct 
decision. According to Yu-lun-clau, “It is not an exaggeration to say 
that today nearly every decision in business is made with the aid of 
statistical data and statistical methods.” According to Boddington, 
“The successful business man is one whose estimates most closely 
approaches accuracy.” 

The price statistics play an important role in the field of business. 
According to M.M. Blair, “If all price statistics were removed from all 
papers, magazines, radio and telegraphic reports for a single day, 
the business world would be paralysed. If all statistics now available 
were removed from the world for one year, utter economic chaos and 
ruin would result.” 


Statistics are extremely helpful in industries also like business. All 
the works, from beginning of an industry to the sale of commodity 
produced by the industry, are based upon the statistics. For 
example, where is an industry to be established ? For it land, labour, 
price of raw materials, transport facilities, local tax, wages of labour, 
availability of skilled labours, etc. are the things that should be kept 
in mind. A perfect decision may be taken by collecting statistics 
related to these problems. Market survey, consumers, interest, 
knowledge of liking etc. are essential to know the demand of 
commodity. For all these forecasting is an essential statistical 
method. 

Production planning is an important part of the scientific 
management in a big factory. Production schedule is prepared for 
the production of various commodities. It is used to check the loss 
due to the over stock or unability of supplying commodities according 
to demand. Hence production plan is related with sales forecasting. 
Statistical methods of analysis are helpful in all these works. 
Statistics is very widely used in quality control of commodity. In 
production engineering to find whether product is confirming to 
specification or not, statistical methods are extremely useful. 


(5) Importance of Statistics in Bankers, Brokers and Insurance 
Companies : Statistics is extremely useful for banks, brokers of 
stock company, speculator, insurance company, social workers, 
leaders of labour organization and politicians. For example, banks 
have to make a very critical study of the cash requirements 
otherwise they may find that they are short of cash and their 
existence is at stake. Similarly the premium rates of life insurance 
companies are based upon very useful study of the expectation of 
life and age of people. Life expectation, Life tables and statistics 
related to population all are prepared by probability theory. The 
chance of surviving of a person at a specific age can be determined 
by the probability theory. An insurance company advertised, “We do 
not know who will die, but we know how many will die.” The specific 


evaluation in insurance company can be done by the statistics. By 
the statistics the speculators, brokers and investors in stock 
exchange can estimate whether the price of a bond will increase or 
decrease. 

Nowadays the politicians are also using the statistics. They are 
eager to know the chances of their winning in the election. They can 
calculate the chance of their winning by exit poll. Leaders of labour 
union researchers of industry and social workers are also using the 
statistics to enlarge their knowledge and experience. In fact, there is 
hardly any field today, that one can find complete without statistical 
data and statistical methods. H. G. Wells was right when he said, 
“Statistical thinking will one day be as necessary for efficient 
citizenship as the ability to read and write.” 


(6) Universal Utility of Statistics : Ours is indeed the statistical 
age. The application of statistical methods are increasing in every 
branch of science. Statistical tools are used in almost all sciences, 
especially astronomy, geology, physics, biology, medicines, 
psychology and meteorology etc. Actually statistics is creaping in 
every department of human activity. It has now become 
indispensable in all phases of human endeavour. Tippett rightly 


stated, “Statistics affects every body and touches life at many point. 

According to Dr. Bowley, “knowledge of statistics is like a 
knowledge of foreign Language or Algebra. It may prove of use at 
any time under any circumstances.” 


LIMITATIONS OF STATISTICS 


Statistics has an important place in modern era. But it is not 
without limitations. It cannot be applied to all kinds of phenomena 
and cannot be made to answer all our queries. The following are the 
important limitations of the science of statistics. 


(1) Statistics does not deal with individual items : Statistics 
deals only with aggregates of facts and no importance is attached to 
individual items. It is, therefore, suited to only those problems where 


group characteristics are desired to be studied. Tippett stated, 
“Statistics is essentially totalitarian because it is not concerned with 
individual values, but only with classes.” 

(2) Statistics deals only with quantitative characteristics : All 
statistics are numerical statements of facts. It is the subject matter of 
Statistics. Statistics deals with only those subjects of inquiry which 
are capable of being quantitatively measured and numerically 
expressed. This is an essential condition for the application of 


statistical methods. 


How all subjects cannot be expressed in numbers health, poverty, 
intelligence, honesty are instances of the subjects that defy the 
measuring rod and hence are not suitable for statistical analysis. It is 
true that efforts are being made to accord statistical treatment to 
subject of this nature also. For example, intelligence of the students 
may be compared on the basis of marks obtained by them in the 
examination. 


(3) Statistical results may be misleading in studied without 
proper context : Statistical results might lead to fallacious 
conclusions if they are quoted short of their context. The argument 
that “in a country 15,000 vaccinated persons died of small pox, 
therefore, vaccination is useless” is statistically defective, since we 
are not told what percentage of the persons who were not 
vaccinated and died. W.I. also stated, “one of the shortcomings of 
statistics is that they do not bear on their face the label of their 
quality.” 

(4) Statistical laws are true in the long-run and on an average : 
The conclusions obtained statistically are not universally true. They 
are based on the probability theory. So they are true in the long-run 
and on an average. 


For example, if an unbiased coin is tossed, then probability of 
getting a head or a tail is 1/2. But this result is true only when coin is 
tossed for a large number of times. If coin is tossed only for four 
times, it is not necessary that we would get 2 heads and 2 tails. So 
statistical laws are not exact. They are only approximation. 


(5) Statistics is not only one method of studying a problem : 
There may be various methods to study a problem. Statistics is only 
one of them. Hence statistical conclusions should be supplemented 
by other evidence and facts. Statistical methods do not provide the 
best solution under all circumstances. Croxtan and Cowden stated 
about it, “It must not be assumed that statistical method is the only 
method to use in research, neither should the method be considered 
the best attack for every problem.” 

(6) Statistical data should be uniform and Homogeneous : Any 
statistical conclusion can be drawn after only comparing uniform and 
homogeneous data. For example, the monthly income of Ram is ° 
500, height of Shyam is 5 feet 6 inches and weight of Mohan is 40 
kg. No conclusion can be drawn from these data as they are not 
uniform and homogeneous. If monthly income of these three persons 
are given, we can draw conclusion whose economic condition is 
good or poor. 

(7) Statistics can be used only by experts : Statistics can be 
used to establish wrong conclusions and, therefore, can be used 
only by experts. Only one who has an expert knowledge of statistical 
methods can handle the statistical data properly. The data placed in 
the hands of an inexpert may lead fallacious results. According to 
Yule and Kendall, “Statistical methods are the most dangerous tools 


in the hands of the inexperts.” 
Dr. Bowley stated, “Statistics only furnish a tool, necessary though 
imperfect, which is dangerous in the hands of those who do not 


know its uses deficiencies.” Moreover, statistics requires experience 
and skill to draw sensible conclusions from the data; otherwise, there 
may be wrong conclusions. W.I. King rightly stated, “The science of 
statistics is a most useful servant, but only of great value to those 
who understand its proper use.” 


THEORETICAL QUESTIONS 


Long Answer Questions 

1. Define science of statistics. (Bhopal 2006; Sagar 2006) 

2. Define ‘Statistics’ and discuss its scope and limitations. (Sagar 2006; Vikram 
2006) 

3. Explain the scope, utility and limitations of ‘Statistics’. (Jiwaji 2005) 

4. Explain the scope and limitations of ‘Statistics’. 

5. Define Statistics and explain the importance of statistics in business and 
commerce. (Rewa 2006) 

6. Define Statistics and elucide its scope and importance. 

7. Bring out clearly the functions and limitations of Statistics. (Vikram 2004; 
Sagar 2006) 

8. “Statistical methods are most dangerous tools in the hands of the inexperts.” 
Discuss and write the limitations of statistics. (Bilaspur 2006) 

9. “All the statistical data are numerical facts but all the numerical facts are not 
statistical data.” Comment on the above statement. (Bilaspur 2005; Indore 
2006) 


10. Explain the following statements : 
(i) “Statistics are like Clay of which you can make a God or a devil as you 
please.” 


(ii) “There are three kinds of lies O lies, damned lies and statistics." 
11. What do you mean by ‘Statistical tools’ ? Explain its importance in modern 
age. 


12. “Statistics affects every body and touches life at many points.” Explain it. 


13. Critically examine the subject-matters of Statistics. What are the limitations of 
Statistics ? 
14. “Statistics is not a science but it is a scientific method." Critically examine of 


this statement. 
(Indore 2006; Ravishankar 2003) 


15. Comment on the following statements : 
(i) “Statistics is the science of counting.” 
(ii) “Statistics is the science of estimates and probabilities.” 
(iii) “Statistics is the science of average.” 


16. “Il have no faith in Statistics.” In the light of this statement explain the use and 


misuse of Statistics. 
(Indore 2000) 


17. What are the main limitations of Statistics ? Can these short comings be 
overcome ? (Bilaspur 2006)) 
18. Statistics affects every body and touches life at many points. Comment. 


(Ravishankar 2003) 
Short Answer Questions 


1. Describe any two limitations of statistics. 
2. Explain any two importance of statistics. 
3. Define the meaning of statistics. 


4. Explain the scope of statistics. 


OBJECTIVE QUESTIONS 


Choose the Correct Answers : 


1. “Statistics is a science of counting” by whom this definition is given ? 
(a) Boddington (b) Bowley (c) Parretto (d) Croxton and Cowden 


2. What is prepared by government with the help of data ? 

(a) Budget (b) Province (c) Money (d) Officer 

3. “Statistics is the science of estimates and probabilities.” Whose definition is this 
2 

(a) Webster (b) Secrist (c) Boddington (d) Yule and Kendall 


4. “Data are just a figure for man in the street.” Whose definition is this ? 
(a) Achenwall (b) Tippett 
(c) Jhonson and Jackson (d) Yule and Kendall. 


5. Which of the following is not a characteristic of data ? 
(a) Data are numerically expressed. 

(b) There is reasonable standard of accuracy in data 

(c) Data are accepted by enumeration or estimation. 

(d) Data are personal facts. 

[ 


Ans. : 1. (6), 2. (a), 3. (c), 4. (b), 5. (d).] 
ee 
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STATISTICAL INVESTIGATION 


— ° Main Stages of Statistical Investigation 
— ° Planning of Statistical Investigation 
— ° Types of Statistical Investigation 

— ° Statistical Unit 

Investigation means inquiry, concept or search. When. this 
investigation or search is conducted by statistical methods, we call 
them statistical investigation. 

To obtain required goal, the scientific and systematic collection of 
data of any field and their analysis with the help of various statistical 
methods and their conclusions after making suitable plan is known 
as Statistical investigation. 


The person who carries the statistical investigation is known as 
investigator. 


Main Stages of Statistical Investigation 

Following are the important steps of statistical investigation : 

(1) Planning of inquiry : Planning of inquiry should be determined 
with great precautions keeping in mind the scope, nature and 
purpose of the investigation. 

(2) Collection of data : After making the plan of investigation, the 
data relating to the problem are collected. There are various 
methods for it. On their basis, the data are collected for the problem 
under study. If primary data are required, then the questionnaires 
should be prepared to get them. . 

(3) Editing of data : Unnecessary data are removed after sorting 
out the collected data. If any type of errors or mistakes present in the 


data, effort should be made to remove them. Thus, data become 
capable of being used. 

(4) Classification and Tabulation of data : The data, obtained 
above, are unsystematic. They are presented through tabulations by 
classifying them into various series and groups, so that, analysis 
become easy. 

(5) Analysis of data : After the presentation of data, they are 
analysed using various statistical methods. Various types of 
mathematical measures are involved in these methods e.g., mode, 
median, mean, standard deviation, correlation, regression etc. 

(6) Interpretation and reporting of data : This is the last step of 
the investigation. Under it proper conclusions are achieved on the 
basis of data and final reports are prepared. Report should be written 
in simple and comprehensive language and it should be as short as 
possible. 


PLANNING OF STATISTICAL INVESTIGATION 


The results obtained from statistical investigation are correct, and 
useful or not, and depends upon the care with which data collection 
is planned. If data are not collected carefully with suitable plan then 
time, money and labour altogether are wasted and also the results 
obtained will be fallacious and away from resemblance. Hence, the 
planning is essential before the data collection. This plan is known 
as planning of statistical investigation. The following points are kept 
in mind under it: 


(1) Definition of the Problem : It is essential to know, what is the 
form of problem, what is its nature, before the collection of data is 
commenced. Any plan of investigation may not achieve the success 
unless clear information about the problem is not obtained and the 
money, time and labour spent on data collection will be wasted. 


(2) Purpose of Inquiry : After defining the problem it is essential 
to know the purpose of the inquiry before the work of investigation is 
commenced. The statement of the purpose determines the data 
which are to be collected and how are to be used. 

The purpose of the inquiry may be general or specific. Under 
general purpose, the data related to census, literacy, production etc. 
are collected. This type of investigation is done for the public welfare 
and generally it is organized on large scale. While the inquiry for 
specific purpose Is performed on rather small scale and it is done for 
the welfare of a particular class, e.g., the data related to wages of 
labour working in paper-mill of Madhya Pradesh. 

(3) Scope of inquiry : Having decided the purpose of inquiry, the 
next step is to determine the scope of investigation. Scope of inquiry 
will be right if it is decided according to the problem. Under it, this is 
to decide what will be the subject-matter and which geographical 
area, natural and economic area are to be covered, e.g., the scope 
of an inquiry may be limited to a particular state or an industrial city 
or it may be whole India or it may be other countries. The scope of 
inquiry is influenced by three factors—purpose of investigation, 
availability of time and availability of resources. 

(4) Period of Investigation : Deciding the scope of inquiry, one 
should determine the time at which it will be conducted. Period 
depends upon the nature of problem and purpose of inquiry, e.g., if 
Pay Commission does not present its report for long time, all works 
done by it may be useless because the prices will be changed due to 
the change in wholesale price index number and price hike both and 
cost of living expenditures will be changed. Hence, at the time of 
planning of inquiry, time limit should be kept in mind. 


(5) Sources of Information : In the next step of making planning 
it becomes necessary to decide the sources from which the data are 
to be collected, sources of information may be either primary or 
secondary. In the case of primary sources, the data are collected by 
the investigator himself. He takes the help of device of filling 
questionnaire by direct personal interview method while in the case 
of secondary sources data are already available from published or 
unpublished sources. So the sources are decided on the basis of 
nature, purpose of inquiry and money allotted for inquiry. 

(6) Types of Investigation : For statistical investigation, it is 
essential that data should be collected correctly. For it, this is 
essential to have the knowledge of various types of statistical inquiry, 
because a single type of inquiry may not be used in all situations. 
Decision about the type of inquiry is taken on the basis of scope and 
purpose of inquiry and time period and resources available. 

(7) Determination of statistical units : Statistical units are the 
means of measurements for making uniformity and comparability 
among the data. Collection, presentation and analysis of data can be 
made on its basis. Unit must be uniform throughout the inquiry 
otherwise analysis will not be possible. . 

(8) Degree of Accuracy : Generally, absolute accuracy is not 
possible in the inquiry. More labour and money are required to get 
the absolute accuracy and even though absolute accuracy is seldom 
possible. The results of statistical analysis, however, will not be 
significantly affected by the absolute accuracy. So the degree of 
accuracy of inquiry should be decided on the basis of the problem. 

(9) Selection of suitable method of collecting data : At the time 
of planning of an inquiry, it should he decided which method is 


appropriate for the investigation. All types of methods will not be 
suitable in various types of investigation. Hence a specific method 
should be selected on the basis of time period and resources 
available for the inquiry. 

(10) Organization of Investigation : Before starting the planning 
of inquiry, it becomes essential to decide what will be the 
organization of inquiry ? That is, how many investigators will he 
required ? Will they need training ? What will be their authority and 
responsibility ? What will be the scope of their work ? etc. 


TYPES OF STATISTICAL INVESTIGATION 


The main types of statistical investigation may be as follows : 

(1) Census or Sample Investigation : In census, the information 
is obtained from each unit while in sampling method each unit will 
not be investigated but only some specific units or selected units are 
investigated and the result of this investigation is assumed as the 
result of whole (aggregate). If we want the data of literacy in a 
village, we can use both methods. In census. literacy data will be 
collected by contacting all the villagers separately and_ their 
percentage will be determined, while in sampling method, some 
houses of the village will be selected and data will be collected from 
them and then percentage will be determined. In statistical inquiry, 
sampling method is mostly used as time, money and labour are 
saved through it. 

(2) Direct or Indirect Investigation : Direct investigation is one in 
which data are collected directly by the investigator. For example, 
the data relating to labour wages can be collected by contacting all 
the labours directly while if we have to test their efficiency, it cannot 
be done by contacting them directly. It can be determined indirectly 


by examining. Both types of investigation are used according to time 
and need. 

(3) Confidential or Public Investigation : An investigation which 
is conducted by the government confidentially in public interest or by 
the business institution to obtain its secret purposes is called 
confidential investigation. The planning and method of such 
investigation are kept confidential and its results are not also 
published. Contrary to it, such investigation which are conducted for 
common objects and whose results are published for common public 
is called public or open investigation. 

(4) Original or Repetitive Investigation : Original investigation 
means such investigation which is conducted for the first time. In 
such type of investigation, complete planning will be original and this 
type of investigation would not have been conducted before it while 
repetitive investigation is carried on in continuation of previous 
investigation e.g., census. 

(5) Regular or Adhoc Investigation : Regular investigation 
means an investigation in which the data are collected regularly by 
permanent staff and they are published, e.g., publication of cost of 
living index number by Reserve Bank. Contrary to it, adhoc 
investigation is carried on at one time to achieve specific object. For 
example, the investigation about monopolies in India conducted by 
Hazari Committee. 

(6) Pilot or Comprehensive Investigation : An investigation 
which is conducted before the main Investigation to test the results is 
known as pilot investigation, while the investigation followed the pilot 
investigation is called comprehensive. 


(7) Extensive or Limited Investigation : Under extensive 
investigation, the data are collected about all aspects of a problem, 
while under limited investigation, the informations are obtained about 
one or two aspects of the problem. 

(8) Postal or Personal Investigation : Under postal investigation, 
the required person is contacted through postal service to collect the 
data, that is, the questionnaire is prepared and sent to the persons 
concerned who fill them and return by post. This procedure is cheap 
but not more reliable. This can be used only in literate society. If data 
are collected by personal contact then it is called personal 


investigation. 
DIFFERENCE BETWEEN CENSUS AND SAMPLE 
INVESTIGATION 

Following are the main differences between census and sample 
investigation : 

Base of Difference Census Investigation Sample Investigation 

1. Inquiry Technique 

2. Resource 

3. Usefulness 

4. Inevitability 

5. Degree of accuracy 

6. Organization and 

Administration 

7. Estimate of Error 

8. Scope of inquiry 

9. Nature of units 


STATISTICAL UNIT 


Before the investigation the statistical unit must be determined 
because the process of data collection is preformed on its basis only 
and then they are analysed and interpreted. However, the 
determination of statistical unit at the time of inquiry is not as simple 
as it appears to be. Defining statistical unit, it can be said, “A 


statistical unit is an attribute or group of attributes conventionally 
collected so that individuals or objects possessing them may be 
counted or measured for statistical inquiry. 

Hence it is essential to define statistical unit clearly so that 
uniformity and comparability among the data will remain from 
beginning to the end of investigation and unnecessary data could not 
be collected. 

Following are the essential requirements of statistical unit : 


(1) Specific and Unmistakable : The definition of statistical unit 
should be simple and clear in each investigation. If one word has 
different meanings then in inquiry it should be cleared in which 
specific meaning the word is being used. For example, somewhere 
the working hour is 7 hours daily, somewhere it is 8 and somewhere 
it is 10 hours daily. In such situation, it should be decided by the 
investigator in the beginning which measure of day will be 
acceptable. 

(2) Uniformity : For comparison point of view the statistical data 


are more important so uniformity is essential in the unit. 

If there is change in unit according to time, it will be impossible to 
compare the data, the results obtained by will be wrong and 
fallacious. 


(3) Stability : Unit should be selected in such a way that there is 
no possibility of sudden change in it /.e., its value should be stable. If 
the value of unit changes repeatedly, it will have wrong effect on 
investigation. 

(4) Appropriateness : The unit should be appropriate essentially 
to the inquiry, that is, the objects which are measured in any unit, 
should be defined in the same unit. If inquiry is to be conducted on 
large scale, its unit should be large and if inquiry is to be conducted 
on small scale, its unit should be small. 


(5) Unanimity : The unit should be such whose scale is universal 
and should be equally prevalent in all places so that illusion could 
not be made in any situation. If different units are used in different 
places, the task of data collection becomes very much complicated, 
i.e., tf somewhere kilometre is used to measure the distance, 
somewhere mile is used and somewhere Kos (equivalent to two 
miles) is used, in this situation, data lack uniformity. 

(6) Comparableness : The unit should be selected in such a way 
that its comparison can be possible with other available data. The 
utility of data remains as long as the quality of comparableness 
remains in them. So the quality of equality and homogeneity in 
selected units is essential because due to it comparison becomes an 
easy work. If it is possible then affected units should be used so that 
problem of comparison does not remain. 


TYPES OF STATISTICAL UNIT 
Statistical units may be classified as follows : 
STATISTICAL UNITS 

Units of measurement Unit of analysis and 

and enumeration interpretation 
Simple Composite Hypothetical Rate Ratio Coefficient 
units units units 
(1) Unit of Measurement and Enumeration 

These units are used to collect the data. These units are classified 
as follows : 


(i) Simple units : The units which are used too much in common 
life, are Known as simple units, e.g., kilometers, hour, litre etc. 

(ii) Composite units : When two simple units are used by 
combining, is known as composite unit, e.g, passenger kilometre, 
ton-kilometre, machine hour, Rupees per kilo etc. 


(iii) Hypothetical unit : For comparison, sometimes hypothetical 
units are used e.g. Horse-power. 
(2) Units of Analysis and Interpretation 


These units are used for making comparison and interpretation of 
data. They include : 


(i) Rate : When the relation of two numeral groups is expressed as 
per hundred or per thousand or in other form, is known as rate, e.g. 
birth rate, death rate, interest rate etc. Here interest rate is quoted 
per hundred while birth and death rates are use as per thousand. 

(ii) Ratio : Relation of two units of same kind is known as ratio. 
Ratio between two units of same kind is determined by division. For 
instance, the income of Gaurav and Ishan are * 80,000 and * 60,000 
per month then the ratio of their income will be 80,000 : 60,000 or 4: 
oe 

(iii) Coefficient : Coefficient means rate per unit. For instance, if 
interest rate for home loan is 8%, then its coefficient will be i = 0.08. 
If coefficient is multiplied by total then related quantity is obtained 
e.g. the interest is to be calculated on * 10,000 then it will be ~ 


10,000 x 0.08 = ~ 800. 

Coefficient is expressed in the form of a formula as : 
where, C = Coefficient 
Q = Quantity dealt with 


N = Total Number 
THEORETICAL QUESTIONS 
Long Answer Questions 
1. What is a Statistical Investigation ? Describe the preliminary steps you would 


take in planning a statistical investigation. 


2. Define a statistical unit and explain what would be essential requirement of a 
good statistical unit. 


3. What do you understand by Statistical Investigation ? Explain its various types. 
Short Answer Questions 


1. What do you mean by Statistical Investigation ? 
2. What is the Statistical Unit ? 
3. Write the name of the various types of statistical investigation. 


4. Give the types of statistical unit. 


OBJECTIVE QUESTIONS 


Choose the correct answers : 


1. The objects and scope of statistical investigation are decided : 

(a) before the compilation of data (b) after the compilation of data 

(c) during the compilation of data (d) never 

2. ‘Rupees per kilogram’ is : 

(a) A simple unit (b) Composite unit (c) Unit of analysis (d) Coefficient 
3. Unit of analysis and interpretation include : 

(a) Coefficient (b) Rate (c) Ratio (d) All of these 


[ Ans. : 1. (a), 2. (b), 3. (d).] 
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Collecting the data from all units of the population, they are units 
analysed under census investigation. 
More time, labour and money are required under this investigation. 
This method is more useful and suitable when area of investigation 
is finite. 
When the number of units in the population is very small (about 10- 
15), this investigation method is inevitable. 
There is accuracy of higher order in this method because all units 
are studied. 
An extensive and greater organization is essential for this method so 
various difficulties are to be faced in its administration. 


Statistical error can’t be estimated in this method. 
If the scope of inquiry is such that where all units are to be studied 


essentially (e.g., cen-sus), this method is inevitable. 

Suitable in the case of heterogeneous units possessing various 
attributes. 

Selecting some representative units from the population, data are 
collected from them and are analysed under sample investigation. 
Rather very less time, labour and money are required under this 
investigation. 

This method is more suitable when area of investigation is more 
extensive and infinite. 


When the units are destructible or infinite ( e.g., stars in the sky, 
water in the sea, blood in the body), this method becomes 


indispensable for the inquiry. 

Since each unit of population is not studied in this method so there is 
accuracy of ordinary level. 

Under this method rather limited organization is needed whose 
administration is easy. 

Statistical error can be estimated upto sufficient extent. 

When the study of all units is not essential, this method can be 
adopted. 

Suitable in the case of homogeneous units. 


3 


PROCESS OF DATA 
COLLECTION 


— ¢ With Primary and Secondary Data 

— ° Types of Data 

— ° Methods of Collecting Primary Data 
— ° Schedule and Questionnaire 

— °* Methods of Collecting Secondary Data 


MEANING OF DATA 


In old time when statistics was not so developed then it was used 
as plural number refers to sets of figures embodying quantitative 
information, for example, a student writes that he has used up-to 
date statistics in his topic then here statistics mean data. A common 
man also takes mass of figures as the data. But in science of 
statistics we may not call only numerical figures as the statistical 
data. All quantitative data are not statistical data. Data are statistical 
when studied groupwise. Statistical data are base and subject-matter 
of the science of statistics. Isolated historical data relating to an 
individual item or event as a separate entity are not statistical data. 
Statistical data always corresponds to a group of units. Statistical 
data are always comparable. Webster defined, “Statistics are the 
Classified facts representing the conditions of the people in a state, 
especially these facts which can be stated in numbers or in tables of 
numbers or in any tabular or classified arrangement.” 


TYPES OF DATA 


Collection of the data is an important work. The process of 
analysis and interpretation of the data depends upon the collected 
data. Collected data should not be insufficient and incorrect because 
the conclusion drawn on their basis will be wrong and illusive. 
Utmost care must be exercised in the collection of data. There are 
two types of data on the basis of process of collection: (a) Primary 
data, and (b) Secondary data. 


(a) Primary Data : Primary data are those data which are 
collected by the investigator for his investigation either by himself or 
by others. For this the investigator collects the data afresh. As the 
data are collected for the first time, they are said to be primary data. 
For example, to know the effect of coaching in passing the 
examination an investigator collects the marks of the students 
studying in the colleges and their examination results and the marks 
of the students not enrolled in the colleges and their examination 
results and draw the conclusion by comparing both then the data 
collected by him will be called as primary data. Since the primary 
data are original, the conclusions drawn with their help are real and 
correct, but more money, time and energy are required to collect 
them. Hence, when the area of investigation is small and to derive a 
new principle then the primary data are used. 

(b) Secondary Data : Secondary data are those data which are 
already collected by the institution or agency for their investigation 
but when these data already collected are used by the investigator 
for his research work then they are said to be secondary data. So 
under the secondary data neither the investigator himself collects the 
data nor through others but uses the data collected and published by 
some other person or institution. Thus secondary data are not 
original but rather published. For instance, the data related to import- 
export of India are published in Reserve Bank of India Bulletin. If any 


investigator study the paymentbalance condition of India by 
analysing the data related to import and export of India published by 
Reserve Bank of India, then it will be said that his conclusion is 
based on secondary data. Secondary data may be in both forms 


published and unpublished. 

According to M.M. Blair, “Secondary data are those already in 
existence and which have been collected for some other purpose 
than the answering of the question at hand.” 

Since more money, time and power are needed to collect the 
primary data hence the investigator uses the secondary data in his 
investigation with the purpose of saving time. means and power but 
it should be kept in mind that the conclusions drawn on the basis of 
secondary data may be mistaken. The reason behind it is that the 
investigator does not collect these data himself and on the other 
hand these data are collected by some persons or institutions with 
the other purposes. Hence the researcher should use the secondary 
data with special cares. In spite of these demerits the most of the 
researchers in modern time are carried out on the basis of 
secondary data. The reason is that the data are being collected 
every year in our country by different institutions like Central 
Statistical Organization (C.S.O.), National Sample Survey (N.S.S.), 
Reserve Bank of India and the statistical organizations under 
different ministry. Data published by there are considered to be 
reliable. So. in order to save time, means and power, the 
researchers are using these data in their research work. But to test 
their reliability the primary data should be collected and analysed to 
some extent. 


DIFFERENCE BETWEEN PRIMARY AND 
SECONDARY DATA 
Primary Data 
(1) Collection : Primary data are collected by researcher himself 
or by their representative investigators 


(2) Objects : These data are according to the objects of the 
investigator. 

(3) Originality : Primary data are original because these data are 
collected for the first time. 

(4) Time, means and power : Time, means and powers are 
needed for the collection of primary data. 

(5) Reliability : Conclusions drawn on the basis of primary data 
are real and reliable. 

Secondary Data 

(1) Collection : Secondary data have been already collected by 
other person or institution which have been published. 

(2) Objects : Secondary data are merely helpful for researcher. 
Since these data are collected by others hence the investigator 
needs to correct them according to his purpose. 

(3) Originality : Secondary data are like produced material. The 
reason is that these data have been already used by some person or 
institution for own purpose. 

(4) Time, Means and Power : Since secondary data are 
published data hence more time, means and powers are not 
required for them. 

(5) Reliability : The conclusions drawn on the basis of secondary 
data may be fallacious also. 

Actually the difference between primary and secondary data is a 
matter of relativity. Primary data once collected by a person becomes 
secondary data in the hands of another. For example, the data 
related to census collected by Director General of census are 
primary data but this will be secondary when these data are used by 
another person or institution for own research. Thus the main 


difference between primary and secondary data is only degree. 
According to Secrist , “The distinction between primary and 
secondary data is largely one of degree. Data which are secondary 
in the hands of one party may be primary in the hands’ of another.” 


METHODS OF COLLECTING PRIMARY DATA 


There are five main methods of collection of primary data : 

(1) Direct personal interview method (4) Mailed questionnaire 
method 

(2) Indirect oral investigation method (5) Schedules to be filled by 
enumerators 

(3) In formation from correspondents 


1. Direct Personal Interview Method 

Under this method, the investigators personally comes in contact 
with respondents and interviews. He collects the informations by 
asking them direct questions. In this method, the investigator makes 
direct contact with the people from whom the data are to be 
collected. For example, if the investigator wants to collect the 
informations about the wages of the labours working in Bhilai steal 
factory, under this method he will personally contact the labours 
working in this factory and collect the data about their wages. The 
data, so collected, are called primary data. 


Merits : The following merits are found in the data collected by 
direct personal interview methods : 

(1) Greater accuracy : There is greater accuracy in the data 
collected by this method. The reason behind it is that the investigator 
collects the data from the respondents by contacting and 
interviewing them. If the interviewers have any doubt about any 
question in their mind, the investigator at once removes their doubts. 
The investigator gets the correct information in this way. If the 
investigator himself feels that the respondent is not giving the correct 


answer of the question but he is hiding some facts, investigator tries 


to get the correct informations by cross- questioning him. 
Thus the data obtained by this method are very accurate and 
reliable. 


(2) Originality : There is a property of originality among the data 
collected by direct personal interview method because these data 
are collected by investigator himself. 

(3) Flexibility : Under this method the investigator can make a 
change in his question according to the need and get the required 
informations. He can get essential informations by using suitable 
language and keeping in his mind the educational level and position 
of respondent. Hence this method is flexible. 

(4) Other helping informations can be collected : The 
investigator can also collect the informations about individual quality, 
speciality and environment of the respondent. These informations 
are very useful in explaining the conclusions of the research. 

(5) Homogeneity : The qualities of homogeneity and 
comparability are found among the primary data as these are 
collected by investigator himself. 

Demerits : Following are the demerits of the data collected by 
direct personal interview method : . 

(1) Chance of personal bias : There are full chances of personal 
prejudice and bias of investigator under this method. Therefore, the 
conclusions drawn from the data collected by bias way may be 
fallacious. 

(2) Do not reveal the characteristics of the population : The 
direct personal interview method may be adapted in small field of 
investigation. Hence it is possible that the data collected in small 


field of investigation may not reveal the entire characteristics of the 
population. In such situation the conclusion drawn may be fallacious. 

(3) Expensive method : If the number of persons to be 
interviewed is more and the field of investigation is large, then this 
method may be very expensive. More time and money arc required 
to collect the data by contacting the respondent and asking 
questions from them directly. 

Suitability : This method is suitable for those investigations where 


(1) More emphasis is given on the accuracy of the data. 

(2) Field of investigation is small and more emphasis is given to 
the extensive study under the investigation. 

(3) The personal presence of investigator is expected due to the 
complexity of investigation. 

(4) Data are to be kept secret. 

(5) More emphasis is being given on the originality of the data. 


Precautions : The following precautions are necessary for the 


investigator while using this method : 

(1) The investigator must be tactful, laborious and patient so that 
he may win the confidence of the respondent ‘and get their co- 
operation in the collection of the data. 

(2) Questions should be simple and clear. Questions should be 
small. Those questions should not be asked from the respondent for 
which he feels bad. 

(3) Cross-Question should be asked to test the truthfulness of 
doubtful answers. 

(4) Investigator should keep away his individual views and biased 
judgment. 

(5) Investigator should have well knowledge about respondent’s 
eating and drinking, clothes apparel, language and tradition. Through 
it, he will win their confidence by mixing up with them and will get 
success to obtain correct data according to his aim. 


2. Indirect Oral Investigation Method 


Under this method the investigator contacts third person or 
witness who is capable of giving necessary informations. This 
method is generally adopted in those cases where the required 
informations are of complex nature and the informants are not 
interested to respond if approached directly. So under this method, 
the informations about the related persons are obtained through 
other knowledgeable persons, to whom we call witnesses. For 
example, the investigator is enquiring about the bad habits of alcohol 
drinking, the people may be reluctant to respond about their own 
drinking habits and the investigator would not obtain the necessary 
informations. So the investigator can collect the informations from 
dealers of liquor or their neighbours. The real informations could be 
obtained about the bad habits of alcohol drinkers through it. The 
correct conclusions may be drawn on the basis of these 
informations. Similarly the real informations about thefts or murders 
arc obtained by the police by interrogating other persons rather than 
the thieves and murderers. Same method is also used by the enquiry 
commission or committees appointed by the government to get the 
necessary informations about the enquiry. 


Merits : Following merits are found in this method : 

(1) Economical : Less time, money and power are spent under 
this method because investigator draws the conclusions by collecting 
informations from witnesses. 

(2) Solution of complex problems : This method is useful to 
solve the complex problems under study. 

(3) Opinion of Experts : In this method the opinion and 
suggestions of the experts are obtained. 

(4) Less chance of biasness : There is less chance of individual 
biasness of investigator 
in it. 

(5) Secret information : Under this method some _ secret 
informations are obtained while the respondents are reluctant to give 
the informations. For example, the alcohol drinkers do not tell 


themselves about their bad habits but the informations may be 
collected by asking their neighbours. 

Demerits : Following demerits are found in this method : 

(1) Not greater accuracy : Under this method the investigator do 
not collect informations by asking informants directly but collect 
informations by asking third parties. Hence there is lack of greater 
accuracy in the informations collected from third parties. 

(2) Possibility of biasness : Here the informations are collected 
from third parties whose biasness may affect the investigation. 

(3) Rough informations : Under this method, it has been seen 
that the informants are reluctant to respond so only the rough 
informations may be obtained not actual informations. 

(4) Lack of uniformity : Separate informations are collected from 
different persons hence the data may not be uniform. Due to lack of 
uniformity of the data, it is difficult to draw correct conclusions. 

Suitability : This method is suitable in such cases where : 

(1) The field of investigation is very large. 

(2) Informants are not taking interest. 


(3) Informants are not capable of answering the questions. 
(4) The nature of data is complex. 


Precautions : The following precautions should be taken by the 


investigator in using this method : 

(1) The informations furnished by the witnesses should not be 
accepted without evidence. 

(2) It should be fully decided that the witnesses have the full 
knowledge of facts and are willing to give the informations. 

(3) The witnesses do not have bias concepts in their mind in the 
favour of or against to the subject. 

(4) Witnesses should be mentally fit and capable of understanding 
the questions and answering correctly. 

(5) The number of witnesses should be sufficient. 


(6) The investigator should win the sympathy and confidence of 
the witnesses. 

(7) The investigator should do work with patience, skill, 
unbiasedness and politeness. 
3. Information through Correspondents 

Under this method, the investigator appoints local agents or 
correspondents in different places to collect informations. These 
correspondents submit the informations to the central office after 
collecting local informations. The investigator classify and analyse 
these informations and draw conclusions about his investigation. 
These correspondents are paid by the investigator. News papers and 
magazines adopt this method. This method is also used by those 
departments of the government where regular data are collected in 
wide area. For example, the government appoints its representatives 
in various places of the country to construct wholesale price index 
number, who regularly send the data related to price of commodities 
to their concerned departments. This method is also used to 
estimate the crop in a particular year. 


Merits : There are following merits : 

(1) Greater scope : This method is useful when the field of 
investigation is very wide and where the data are collected for time 
to time after a gap of sometimes. 

(2) Economical : Less time, money and labour are used in the 
collection of informations under this method. So this method is more 
economical. 

Demerits : There are following demerits : 

(1) Lack of originality : The data obtained by the correspondents 
may not be original because the correspondents generally send the 
data on the basis of presumption. 

(2) Effect of biasedness : If the correspondents have the feelings 
of biasness, then the data sent by them may be inaccurate. Hence 


the conclusions drawn on the basis or these inaccurate data may not 
be correct. 

(3) Information is delayed : Sometimes under this method, the 
investigator receives the information from the correspondents so late 


that its significance is over. 

The biggest drawback of this method is that the conclusions drawn 
on the basis of collected data by this method may not be always 
accurate and real. The reason is that the individual presumption and 
biased desire of the correspondents affect the data more. 


Suitability : This method is generally suitable for those 
departments where the data are collected regularly after a definite 
time from a wide area. 

Precautions : Under this method, the investigator should have 
the following 
precautions : 


(1) The correspondents should be free from individual concepts 
and biased desire. 

(2) Correspondents should have the ability to understand the 
problem and send the informations accordingly. 

(3) Conclusions should be drawn after testing the informations 
sent by different correspondents. 


4. Mailed Questionnaire Method 


Under this method the investigator prepares a list of questions 
pertaining to the enquiry under investigation to collect the data. It is 
known as questionnaire. He sends these questionnaire to the 
informants by post. The questionnaire contains the questions and 
provides space for answers. The informants dully fill the informations 
in the blank space and later send it back to the investigator. A 
covering letter is also sent with this questionnaire requesting the 
informants to fill up the questionnaire and send it back within fixed 
time. The informants are given assurance through this letter that the 
information sent by him will be kept confidential. To return the 


questionnaire, the prepaid postage stamps affixed envelope is also 
sent. 


Merits : The following are the merits in this method : 

(1) Suitable for extensive field : This method is suitable for 
those field of investigation where the informants are spread over a 
vast area. The informations are obtained by sending the 
questionnaire to them. 

(2) Economical : Less time, money and labour are required in 
this method. Under this method the investigator has to bear only the 
expenses of printing the questionnaire and sending it to the 
informants and getting back it from the informants. Contrary to it, in 
direct personal interview method the investigator has to go to the 
informants. By collecting the informations from correspondents, they 
have to pay the salary regularly. So this method is very economical. 

(3) Lack of inaccuracy : Since in this method, the informations 
are given by filling the questionnaire by informant himself, there is a 
less chance of inaccuracy in the informations. 

(4) Originality : Since the informant fills the informations himself, 
the data so obtained are original and unbiased. 

Demerits : There are following demerits : 

(1) Lack of interest in informants : Sometimes the informants 
are not willing to give written informations and send back the 
questionnaire. The informations supplied by the informants may not 


be correct. They fill only 10 to 20 per cent questionnaire. 
There are more chances of error in the conclusions drawn on the 
basis of informations obtained by improper filling of questionnaire. 


(2) Information depends on questionnaire : If the questionnaire 
is simple, the informant will give correct answer and if the 
questionnaire is complex, the informant may give wrong answer. 


Since the investigator may not present himself before the informants 
to explain the questionnaire, the drawn conclusions may be 
fallacious due to wrong filling of questionnaire by the informant. 

(3) Effect of biasness : If the informant has biased desire, the 
conclusion drawn on the basis of data sent by him may be wrong. 

(4) Fear of informants : People fear from giving written 
informations because they feel that written informations furnished by 
them may be used against them. In such situation they do not send 
the informations. 

(5) Lack of flexibility : This method is not flexible because on 
receipt of incomplete informations it is not possible to ask 
supplementary questions from thc informants the reason behind it is 
that under questionnaire method neither investigator nor trained 
persons are present before the informants. 

(6) Limited use : This method can be used only when the 
informants are educated people because they can only fill in the 
questionnaire and send the answers in writing. It can’t be used for 
illiterate persons. Hence the use of this method is limited. 

Suitability : The use of this method is only for those investigation 
where informants are literate and are spread over a wide area. Big 
companies in India use this method to know the opinion of 
consumers about their produced materials. 

Precautions : Special attention should be given for the following 
precautions under this method : 


(1) The sympathy and active co-operation of the informants are 
essential. The language of investigator should be polite and effective 
to get their co-operation. 

(2) The questions in the questionnaire should be simple, clear and 
effective. 


(3) Prepaid postage stamps and envelope with questionnaire 
should be provided to informants so that they can send back 
questionnaire easily and quickly. 

(4) The number of informants should be more. 

(5) There should be provision of prize or encouragement for the 
informants who send back the questionnaire by filling properly. 

(6) The accuracy of informations given by the informants should 
be tested. 


5. Schedules to be Filled in by Enumerators 


This method is quite similar with fourth one. The difference 
between two is only that under mailed questionnaire method the 
informants fill in the answers of questionnaire themselves while 
under this method the answers of questions asked in questionnaire 
are filled by trained enumerator by asking from informants. Under 
this method the difficulties of inaccuracy and adequate filling of 
questionnaire are removed. In this method the enumerators are 
trained first and sent later to different fields. The enumerators go to 
their fields and contact the informants and fill in the questionnaire by 
asking the questions from them. The success of this method 
depends on the enumerators. So the enumerators should be skilled, 
laborious and tactful. They should have ability of collecting correct 
informations by explaining the questions to the informants. They 
should also avoid from filling mutual contradictory informations. The 
enumerators should have good knowledge about the living style, 
fooding, tradition and language of residents of that area. 

Merits : 


(1) Suitable for extensive field : This method is more suitable for 
extensive field to collect the informations because informations may 
be collected by sending enumerators to distant places. 

(2) Accuracy of high level : There is accuracy of high level in the 
data as the data are collected by the trained enumerators. 

(3) Individual contact : The enumerators contact the informants 
personally, so they can also obtain accurate and reliable answer of 
complex questions. 


(4) Effect of biasedness : The enumerators are skilled and 


trained so there is a little chance of biasness. 

(5) In this method the informations may be also collected from 
uneducated informants. 

Demerits : 

(1) Expensive method : In this method the enumerators are to be 
trained and employed so it is very expensive. 

(2) Time consuming : In this method, more time is required to 
collect the data. The reason is that the enumerators are first trained 
and then they are employed to collect the informations from different 
fields. They collect the informations by contacting the informants. So 
this method is time consuming. 

(3) Biasness : If the enumerators have bias desire, then it will 
affect the conclusions. So the conclusions may be unreliable 

Precautions : 

(1) The enumerators should be skilled, honest, labourious and 
tactful. 

(2) A filled questionnaire in the form of a sample should be given 
to the enumerators. 

(3) Questions should be simple, clear and less. 

(4) The enumerators should be trained properly. 

(5) The works of enumerators should be supervised properly. 

(6) The enumerators should have knowledge about customs. 


SELECTION OF SUITABLE METHOD 


Which method is the most suitable among above five methods to 
collect the data ? It is a serious question. Anyone method can’t be 
considered as the most suitable because different methods are good 
in different situations. So at the time of selection of suitable method, 
the following points should be kept in mind : 


(1) Nature of investigation : The nature of investigation is such 
that in which the personal contact is essential then direct personal 


interview method will be a right choice, but if the informant is 
educated and reliable, then the informations may be collected by 
sending questionnaire by post. Contrary to it if the informant is 
illiterate, then the primary data may be collected by appointing the 
enumerator for filling in the questionnaire. 

(2) Field of investigation : If the field of investigation is small, the 
direct personal interview method will be suitable, but if the field of 
investigation is wide, indirect oral investigation will be a suitable 
method. If the field of investigation is scattered, then the data may be 
collected by appointing local correspondents. 

(3) Degree of accuracy : If the degree of accuracy is high, the 
personal interview method will be suitable, but if the proper degree of 
accuracy is required in the investigation, then indirect oral 
investigation or schedules to be filled by enumerator or mailed 
questionnaire method will be adopted. 

(4) Available money : If the investigation is less expensive then 
indirect oral investigation method will be suitable or the informations 
may be collected by sending questionnaire by post, but if the finance 
is sufficient then direct personal interview method or schedules to be 
filled by trained enumerator method for wide field may be adopted to 
collect the data. 

(5) Available time : If the time is very less for the investigation, 
indirect oral investigation method will be suitable, but if time is 
sufficient then other method may be used. 

(6) Purpose of investigation : If the purpose of investigation is to 
introduce new theory, the direct personal interview method will be 
the best. If the field of investigation is wide then the data may be 
collected by appointing trained enumerators, but if the purpose of 


investigation is to test the verification of the principles already 
introduced or to support them then the informations may be collected 
by sending questionnaire by post or data may be collected by local 
correspondents. 

Thus collection of primary data is affected by so many factors, but 
the success of collection of data depends on the individual ability 
and experience of the investigators. Hence Dr. Bowley stated, “In 


collection (and tabulation) common sense is the chief requisite and 
experience is the chief teacher.” 


SOURCES OF COLLECTING SECONDARY DATA 


The investigator does not use the primary data in most of his 
studies. Its reason is that the data published by main institutions, 
agencies or government departments are available on related 
subjects for time to time. Since the data published by these 
institutions are considered to be more reliable, the investigators 
obtain the conclusions by analysing these published data; saving 
their time, means and labour. 

There are two main sources to obtain these secondary data : 

(1) Published sources (2) Unpublished sources 


Published Sources 

The important data are published for time to time by different 
government and non- government institutions and investigators. 
These published data are used by investigator for his investigation. 
These are known as their sources of secondary data. Following are 
the main sources of published data : 


(1) International publications : Various types of informations are 
published for time to time by different international organizations and 
institutions such as International Labour Organisation (I.L.O.), 
International Monetary Fund (I.M.F.), World Bank etc. The data 
published by these organisations are used for investigations. 

(2) Government publications : Various department of the 
government of each country collect and publish regularly statistics on 


a number of subjects. These statistics are very important and 
reliable. The main government publications are : Statistical Abstract 
of India (annual), Census of India, Reserve Bank of India Bulletin, 
Indian Trade Journal, Annual Survey of Industries etc. 

(3) Semi-official publications : The data published by local 
bodies such as municipal corporation, District Board etc. come under 
it. These data are related to birth-death, health. education and 
income-expenditure. 

(4) Reports of commissions and committees : Various 
commissions and committees are constituted by the government or 
other institutions to study different problems. These commissions 
and committees present their report by collecting the data about 
related problems e.g. Report of Finance Commission, Report of 
Monopoly Commission etc. 

(5) Trade institutions : Trade associations e.g. Indian Industry, 
Commerce Association, Jute Mills’ Association, Hindustan Lever 
Ltd., Birla Group, Tata Association Ltd. etc. collect different types of 
data and publish them. 

(6) Research institutions : Universities, Research Bureau. 
Research Institutions collect different types of data and publish them. 
For example data published by National Council of Applied 
Economic Research (N.C.A.E.R.), Indian Statistical Institute (I.S.1.). 

(7) Papers and Magazines : Most of the papers and magazines 
also use to publish data, for example, Economic Times, Industrial 
Times, Commerce etc. collect and publish data in particular field. 

(8) Research scholars : Most of the research scholars publish 
their research works. 

Unpublished Sources 


It is not essential that all statistical data should be always 
published. The data collected by various government departments or 
private institutions are maintained in the form of records and they 
use them for their departments. They do not publish them. The 
investigator can use these unpublished data for his study. Thus 
investigator can obtain the secondary data from both sources : 
Published and Unpublished. 


MERITS OF SECONDARY DATA 


(1) Economical : To use the secondary data for research is 
always economical because under it the expenses in publication of 
the collected data or giving training and salary to the enumerator, 
preparing questionnaire, sending it by post etc. are not to be born. 

(2) Less time consuming : If the secondary data are available, 
they can be obtained quickly in comparison of primary data and 
conclusions can be drawn within short time by analysing them, while 
three or four months are consumed in the collection of primary data. 
Also for primary data more time is required for their classification, 
tabulation and analysis and drawing conclusions from them. 

(3) Secondary data are obtained in some subjects but it is 
generally impossible to collect them in the form of primary data. For 
example, it is impossible to obtain the data published by census 
through a person or a research institution but they can be obtained 
under government publication. 


DEMERITS OF SECONDARY DATA 


There are two main difficulties in the use of secondary data : 

(1) It is generally difficult for investigators to obtain adequate data 
according to their studies. 

(2) It is always doubtful that the secondary data will be sufficient, 
accurate and reliable according to the purpose of the investigation. 


PRECAUTIONS IN THE USE OF SECONDARY 
DATA 


The investigators are required to have the extra precaution while 
using the secondary data because secondary data may be 
erroneous. For example, change in definition of statistical unit, 
incompleteness of information, desire of biasness, differences of 
area and purpose. So special precaution is required while using 
secondary data, should be test whether the data are reliable, 
sufficient and according to purpose. By getting satisfactory answer or 
these questions, secondary data should be used in the investigation. 
Professor Bowley pointed out, “It is never safe to take published 
statistics at their face value without knowing their meaning and 
limitations and it is always ‘necessary to criticise arguments that can 
be based upon them.’ 

The following points should be taken into consideration while using 
the secondary data : 

(1) Which institution, agency or person have collected and 
published the data ? Are these data were really collected originally or 
taken from another primary source ? It should be tested. 

(2) What was the purpose to collect the primary data originally. 
Can these data be used for present purpose in the same form 
without tabulation ? It should be tested. 

(3) Which method was adopted to collect the primary data ? Was 
that method important ? This fact should be tested. 

(4) Were census method or sampling method adopted to collect 
the primary data ? Census method is extremely reliable. If the 
sampling method was used then the sample size was appropriate 
and the sampling method was suitable ? All these questions should 
be checked. 

(5) When were the primary data collected in original form ? Was 
that time normal period’ ? Is that period also relevant for the present 
study ? 

(6) What was the degree of accuracy at the time of collection of 
the primary data ? Is that degree of accuracy also suitable for 
present study ? It should be also tested. 


(7) Which statistical unit ‘was used at the time of collection of 
primary data ? What was the unit of analysis ? Were the units of 
collection and analysis well defined ? Can they be used for present 
study ? 

(8) Are the data published by other sources also available in the 
same subject ? If yes, by comparing both it should be verified which 
data are reliable. 

(9) If the data are reliable they should also be examined before 
using them : 

After testing the above facts if the secondary data seem reliable, 
sufficient and suitable then they should be used in the investigation. 


THEORETICAL QUESTIONS 
Long Answer Questions 

1. How many types of data are there ? Differentiate them. 

2. Define primary data. Explain merits and demerits of various methods of 
collecting primary data. 

3. Explain methods generally used for collecting data. 

4. State the meaning of secondary data. What precautions should be taken 
before using secondary data for further investigation ? (Bhopal 2009) 

5. What do you understand by secondary data ? State various sources of 
secondary data. What precautions are taken while using them ? (Jiwaji 2009) 

6. What precaution should be taken while using published data ? 

7. Give various sources of secondary data. 

8. “Statistics are numerical statements of facts, but all facts numerically stated 
are not statistics.” Clarify this statement and point out briefly which numerical 
statements of facts are statistics. (Indore 2004) 

9. Distinguish between Primary and Secondary data. Explain the various 
methods of collecting 
primary data and point out their merits and demerits. (Bilaspur 2009; Indore 


2004; Jiwaji 2005) 


10. Distinguish between primary and secondary data. Mention the various 
sources of the secondary data. 

11. “Il have no faith in Statistics.” In the light of this statements explain the use 
and misuse of 
Statistics. (Indore 2000) 

12. What are the main limitations of statistics ? Can these shortcomings be 


overcome ? 
(Bilaspur 2006) 


13. Statistics affects every body and touchas life at many points. Comment. 
(Ravishankar 2003) 

14. Distinguish between primary and secondary data. (Ravishankar 2005) 

15. Write short notes on primary and secondary data. 


16. Write short notes on: 

(i) Direct personal investigation method (ii) Sources of secondary data 
(iii) Precautions while using secondary data (iv) Definition of statistics 
(v) Limitations of statistics (vi) Indirective statistics 


(vii) Distrust of statistics (viii) Types of data (Bhopal 2005, 06) 
17. Explain various methods used for collecting primary data with merits and 


demerits. 
Short Answer Questions 


. What do you understand by primary data ? 
. What is secondary data ? 

. What is direct personal Interview method ? 
. What is indirect and investigation method ? 
. What do you understand by schedule ? 

. What is a questionnaire ? 


. Describe merits and demerits of secondary data. 
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. What precautions should be taken at the time of use of secondary data ? 


OBJECTIVE QUESTIONS 


Choose the correct answers 
1. What is prepare by government with the help of data ? 
(a) Budget (b) Province (c) Money (d) Officer 


2. Which of the following is not a characteristics of data ? 
(a) Data are numerically expressed. 

b) There is reasonable standard of accuracy in data. 

(c) Data are accepted by enumeration or estimation. 

(d) Data are personal facts. 


3. The data collected by the investigator himself is called : (Kanpur 2004; Avadh 
2004) 
(a) Primary (b) Secondary (c) Both Primary and Secondary (d) None of the above 


4. The method for collecting primary data is : 
(a) Direct personal investigation (b) Indirect oral investigation 
(c) Information through local correspondents (d) All the above 


5. The data obtained from Financial Express are : 
(a) Primary (b) Secondary (c) Both (a) and (b) (d) None of these 


6. The sources of data are : (Avadh 2004) 
(a) Primary only (b) Secondary only (c) Primary and Secondary (d) None of these 


7. One of the important step of statistical research is : 

(a) Collection of data (b) Local correspodent (c) Enumerator (d) None of these 

8. In collection of statistical data the chief requisite is : 

(a) Common sense (b) Published sources (c) Unpublished sources (d) None of 
these 


9. Statistical data are of what type ? 
(a) 2 (b) 5 (c) 3 (d) None of these 


10. Who collects primary data ? 
(a) Investigators (6) Enumerators (c) Both (a) and (b) (d) None of these 
11. The data collected in the past but being used in the present investigation will 


be known as: 


(a) Primary data (b) Secondary data (c) None of the two (d) Imaginary data 
12. Sources of secondary data are : (Avadh 2004) 
(a) Data obtained by survey for the purpose (b) Data obtained by personal 


investigation 
(c) Production records of a firm (d) All of these 


[Ans. 1. (a), 2. (d), 3. (a), 4. (d), 5. (b), 6. (c), 7. (a), 8. (a), 9. (a), 10. (Cc), 
11. (b), 12. (c).] 


Process of Data Collection | 
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METHODS OF SAMPLING 


— ° Methods of Sampling 

— °* Size of Sample 

— ° Test of Reliability of Sample 

— ¢ Preparation, Specimen, Technique of Using of 
Questionnaire 


MEANING OF SAMPLING 


Various works and knowledge are based on sample for some 
extent in our day to day life. We use sample successfully in various 
areas of life. Each and every grain of a bag of rice can’t be examined 
at the time of purchasing rice. Aroma of rice, the size of its grain and 
it is weevilled or not all can be judged through a handful of rice only. 
Meeting with a person for 4 to 6 hours before marriage, his 
behaviour view, character, personality etc. can be judged and 
decision can be taken whether to marry or not. Seeing a small piece 
of cloth of a lot, the quality of whole lot can be judged. At the time of 
purchasing cottons, the mill owners can take decision to by quintals 
of cotton on the basis of a sample of 5 kg. cotton. A doctor examines 
two drops of blood and gives the information about disease present 
in the body. All these are the few examples of uses of sample in daily 
life. In above example, the bag of rice is a population and a handful 
of rice is a sample . To select a small part from the population or 
aggregate, which represents the all characteristics of that population, 
is known as sampling . Conclusions about the population are drawn 


by the study of sample. Effort is made to get the maximum 
informations about the population by observing the sample. 

Essentials of sampling : The inferences, drawn about the 
population with the help of sample study, should be correct and 
useful. For it, the selected sample should have the following 
essentials : 

(i) Representativeness : It is essential that all the characteristics 
of the population should be present in the selected sample, it is 
possible only when all the units of the population have equal chance 
of being selected in the sample. Sample, thus selected, will really 
represent the population. 

(ii) Independence of units : All the units selected in sample 
should be independent altogether. They should not be dependent on 
each other. 

(iii) Adequacy : Sample size should he optimum /.e., neither more 
nor less. If the sample size is very large, expenditure of investigation 
will increase more and time will also require more. If the sample size 
is small, it may not represent the characteristics of the population 
and conclusion will be false and fallacious. 

(iv) Regulating conditions are similar : Regular conditions of all 
units of selected sample should be the same. 

(v) Homogeneity of units : Homogeneity in the units of 
population and that of the sample is essential /.e., there should not 
be basic changes in the nature and characteristics of units of both. 

METHODS OF SAMPLING 


The main methods of selecting representative units from the 
population can be presented through the following table : 


Methods of Sampling 


1. Deliberate sampling 

2. Random sampling 

Lottery system By Rotating drum By Random numbers Systematic 
method 

3. Mixed sampling 

Stratified sampling Multi-stage sampling Multi-phase sampling 
Cluster sampling 


4. Other methods 


Extensive Quota Convenience Area Sequential Self-selected Line 
sampling sampling sampling sampling sampling sampling sampling 


1. Deliberate Sampling 

According to this method, the work of data collection depends on 
the judgement and skill of the investigator. The investigator selects 
such units from the population with his judgement and skill in which 
all the characteristics of the population should be covered. Which 
units are to be included in the sample or not to be included depends 
entirely on the will of investigator, for this, he decides a standard and 
then he selects the units for data collection according to that 
standard. Thus when some units are selected from the population for 
a specific purpose according to investigator’s will then it is known as 
deliberate or purposive sampling. This method should be adopted in 
practice only when it doubts that simple random sample may miss 
some important facts and units whose deep study is essential. 
Success of this methods depends on the experience, judgement 
skill, honesty and unbiasedness of investigator. 

Merits of the method : 


(1) Simplicity : This method is very simple. The investigator 
selects the representative units according to his judgment and will. 

(2) Reduction in expenditure : In this method, selection of units 
is predestined and in small quantity so less time, labour and money 
ar e required. 


(3) Suitable for specific study : This method is suitable for those 
enquiries where deep and specific study of a particular unit or some 
units is essential for research. 

(4) Proper selection by standard decision : In this method if a 
standard is decided according to the purpose of inquiry by making a 
proper planning in the beginning, the selection of sample will be 
correct and proper. 

Demerits of the method : 

(1) Doubt about biasedness : In this method the personal 
prejudice and idea of investigator may influence the sample selection 
so the sample selection may remain bias. 

(2) No possibility of calculation of sampling error : In this 
method selection of sample is not based on probability but it is based 
on the personal prejudice so the calculation of sampling error is not 
possible. 

(3) No possibility of comparison : The conclusions obtained 
through this method can’t be compared with the conclusions of 
collected data through other methods. 

(4) Erroneous and fallacious conclusion : In this method there 
is always a possibility of biasedness done by the investigator and 
each unit of population can’t get equal and justified chance of being 
selected so the inferences may likely be erroneous and fallacious. 

2. Random Sampling 

In this method each unit of the population has an equal 
opportunity of being selected in the sample. In this technique 
selection of units is based on chance or probability entirely, no 
importance is given to the will of investigator so there is no possibility 
of personal bias. Since the selection of units is not in the hand of 


investigator /.e., human but it is a random so it is known as random 
sampling. According to Dr. P. Yates, “Every member of parents 
population has had an equal chance of being included.” According to 
Croxton and Cowden, “Sampling is said to be random if each 
possible sample (combination of a given number of item) has the 
same probability of being drawn.” According to C.H. Mayers, “A 
sample is said to be random when each unit as drawn has 
probability identical to the probability of all the other units which 


might have been drawn in its place.” 


Sample selected on the basis of random sampling has the 
capacity to represent the whole population because all types of units 
have got equal opportunity for the selection in such sample. Under 
this method the sample can be taken by the following four methods, 
we call them method of random sampling. 


(i) Lottery system : This is a very simple and handy method of 
random sampling. Under this method all units of the population are 
named on separate slips of paper or numbered as 1, 2, 3, 4.... and 
all these slips are then folded and mixed up. A blindfold selection is 
then made of the number of slips by an unbiased person or 
investigator himself to constitute the desired sample size. For 
example, suppose 20 villages out of 200 villages are to be selected 
then we will prepare separate slips of papers for 200 villages and 
fold them, mix them thoroughly and then make a blindfold selection 
of 20 slips. The villages, which are named on these slips, will be 
selected for the investigation. In this technique, it is essential to keep 
in mind that all slips should be of identical size and shape, they 
should be folded in the same way and mixed up thoroughly. Now one 
who will select the slip will do this work with full unbiasedness. 


(ii) By Rotating Drum : In this technique, the name or number of 
all units of the population are written on wooden or iron pieces and 
then they are put in a drum. The drum is rotated by hand or by a 
machine so that all pieces mixed up thoroughly, then required 
number of pieces are selected from the drum by hand or by 
mechanical machine. For unbiasedness here it is also essential that 
wooden or iron pieces should be of identical size and shape. 

(iii) By Random Numbers : In this technique units are selected 
through random numbers. Various statisticians and organizations 
have prepared the tables of these random numbers. Out of these 
Tippett’s table is the most popular table. Some of the important 


series of random numbers are as follows : 

(a) Fisher and Yates Table : In this table, there are 15,000 
numbers having 2digits 

(b) Kendall and Smith Table : This table consists of 1,00,000 
numbers having 2-digits and 4-digits 

(c) Tippett’s Table : There are 10,400 sets of numbers having 
four digits. 

(d) Rand corporation table of 10,00,000 random numbers. 

(e) Snedecor’s table of 10,000 random numbers having five- 
digits. 

In these tables of random numbers, the numbers are inscribed one 
by one without order and required samples are selected from them. 
For example, the first 35 numbers of Tippett’s Table are as follows : 

2952 6641 3992 9792 7969 5911 3170 

5624 4167 9524 1545 1396 7203 5356 

1300 2693 2370 7483 3408 2762 3563 

1089 6913 7691 0560 5246 1112 6107 

6008 8126 4233 8766 2754 9143 1405 

For instance, if a sample of 16 units out of 4,000 units of a 
population is to be selected then first of all these 4,000 units are 
serially numbered from to 4000 and then we will draw first 16 


numbers from this table which are not more than 4000. These 
numbers for this example will be as follows : 

2952, 3992, 3170, 1545, 1396, 1300, 2693, 2370, 

3408, 2762, 3563, 1089, 0560, 1112, 2754, 1405 

If the units of the population are less than 1000. In such situation 
units are assigned by the numbers from 0 1 to 0999 according to the 
size of the population and then the numbers are selected from 
Tippett’s table in such a way that they should be less than population 
size and be according to sample size. For instance, if a sample of 
size 10 units is to selected from a population of size 300 then the 
population units are assigned by numbers 01 to 300 and select the 
numbers from Tippett’s table upto 300. Thus, we will select a sample 
of size 10. 

If the units of population are less than 100 : If the population size 
is less than 100 then we will select the number from table by dividing 
4-digits numbers into 2-digit numbers and write down the numbers in 
pairs according to size of sample. For instance, if 12 units of a 
sample are to be selected from a population having 90 units then 
starting with the first row we will select 12 numbers, which are less 
than 90, in the following way : 

29, 52, 66, 41, 39, 79, 69, 59, 11, 31, 70, 56. 


(iv ) Systematic method : In this method, first of all, units of the 
population are arranged serially. Now we find a specific class interval 
with the following formula : 


ber of all units in the population 
its in the sample 


Class Interval = Stnseroranits in 

The population is divided into various classes according to the 
figure obtained by above formula. After this a number is selected at 
random from the first class. Next number is selected by adding the 
class interval to aforesaid selected number. Similarly for next unit, 
class interval is added to preceding unit. It is done as long as the 
sample of required size is not obtained. For example, if 10 students 
are to be selected from a population of size 80 then the figure of 
class interval is determined as class interval = w~°. 


Now serially 10 classes consists of 8-students from 80 students 
are made. Suppose serial number 6 is chosen at random then first 
unit of sample will be 6th unit, second is 6 + 8 = 14th unit, third is 14 
+ 8 = 22nd, 4th is 22 + 8 = 30th unit. Thus the required sample will 
be 6, 14, 22, 30, 38, 46, 54, 62, 70, 78. 


Merits of Random Sampling Method 

(1) Scientific Technique : Random sampling method is the most 
scientific method of selecting a sample from the population because 
in it all units of the population have got equal opportunity of being 
selected and there is no possibility any type of biasedness. 

(2) Measurement of sampling error is possible : In this method, 
the measurement of sampling error is possible at different levels of 
significances. Testing work of conclusions drawn by it can be also 
done. 

(3) Economical method : Being a very simple procedure of 
sample selection, in this method the work is done in very less time. 
Labour and money are not misused. 

(4) Suitable Representation : All the characteristics of the 
population exist in the sample selected by this method. Hence. it is a 
suitable and real representative of population. 

Testing of Reliability : Original sample can be tested in this 
method. For this, the help of sub sample can be taken. 

Demerits of Random Sampling 

(1) Need of complete catalogue : The complete catalogue of all 
units of the population is essential in this method. If the investigator 
does not have the complete list of the units. the use of random 
sampling is not possible. 

(2) More Expenditure : In field surveys, the units of selected 
sample on the basis of random sampling are widely dispersed and 


the cost of collecting data become too large. Also time and money 
are misused in preparing the list and slips. 

(3) Widely spread population : If the population is widely spread 
then some units are to be ignored on practical basis at the time of 
sample selection. Thus sample could not represent the population 
properly and also procedure of random sampling could not be 
followed. 

(4) Lack of Representation : If the sample size is very small and 
the population is very heterogeneous then the sample selected on 
the basis of random sampling does not have the _ proper 
representation of population. 

(5) Ignorance of important units : Selecting a sample by 
random sampling sometimes important units cannot be selected for 
sample while their study is essential for the investigation and the 
results from so selected sample can’t give the proper level of 


accuracy. 
Difference Between Deliberate Sampling and Random 
Sampling 
The main differences between deliberate sampling and random 
sampling are given below : 
S.N. Base of Difference Deliberate Sampling Random Sampling 
1. Selection of sample 


2. Doubt of Biasedness 
3. Equal Chance 

4. Sampling Error 

5. Suitability 


3. Mixed Sampling 

Such methods of sampling in which merits of both purposive 
sampling and random sampling are included, are said to be mixed 
sampling : 


(i) Stratified sampling : In this method, purposive sampling and 
random sampling both are combined. For selecting sample by this 
method, first separate groups or classes of units are made on the 
basis of different attributes of the population. After that sample is 
constructed by selecting some units from these groups on the basis 
of random sampling. For instance, to study the financial condition of 
the population of a city, if total population of the city is 1000, then 
separate classes of people engaged in different professional will be 
made. Now if a sample of 100 persons is to be selected from 
separate classes consisting of 500 in agriculture, 50 in grocery shop, 
300 in service and 150 in other professions then we will select 10% 
persons by random sampling from each class. So we will obtain a 
sample 50 persons from agriculture, 5 from grocery, 30 from service 
and 15 persons from other professions. Such selection is said to be 
proportional stratified sampling and if we select equal number of 
persons from each class (25 persons from each class in this 
example) then this type of selection is said to be non-proportional 
stratified sampling. 

(ii) Multistage sampling : When the area of investigation is very 
wide then this method of sampling becomes very suitable. In this 
technique whole population is divided into various stages. After this 
division, obtained groups are again subdivided on the basis of 


random sampling. 

Again some units are selected by random sampling from the 
groups obtained through this subdivision. In India, National Sample 
Survey Organisation (N.S.S.O) uses this technique in_ its 
investigation. 

For instance, if survey of agricultural yield production in Madhya 
Pradesh is to be conducted then we will first select districts again 
select tehsils from districts by random sampling and again select 


villages from tehsils by random sampling. In this way, sample will be 
prepared at multistage by random sampling. 


(iii) Multiphase sampling : In multiphase sampling technique, a 
large sample is selected from the population by random sampling 
method. After it, a sub sample is selected from that large sample on 
the basis of various problems according to need of study. These sub 
samples are used at various phases for the study. In this way, 
sample is selected at various phases in this method. 

(iv) Cluster sampling : When the population is divided into 
various clusters and a sample of clusters is made by selecting 
clusters through random sampling then method is, said to be cluster 
sampling. In this method, each unit of the population does not get 
equal opportunity of being selected in the sample. The units of the 
cluster, selected in the sample are examined. This method is used to 
examine the variety of industrial products; 


4. Other Methods of Sampling 


Some other methods of sampling are also used apart from 
aforesaid methods whose descriptions are given below : 


(i) Extensive sampling : In this method of sampling, the most of 
the units of the population are included in the sample. If it doubts of 
difficulties in data collection from some units, they are left out. The 
upper hand of this method is that the data are collected from the 
large part of the population but if ignored units are more important 
then it will affect the result and conclusions may be illusive. 

(ii) Quota sampling : This technique has been developed in 
America. This method is a type of purposive sampling. In this 
method the population is divided into some parts thereafter the quota 
is fixed for investigators’ work. The investigator is free to select the 
units of his quota and he uses his skill, judgment and experience. 


Random sampling technique is not used to select the units. In this 
method investigator does not spend more time and money in data 
collection because he collects the data from those person whom he 
meets in the group. When his quota is completed, he finishes his 
work. This method is often used in public opinion studies and 
professional surveys. 

(iii) Convenience sampling : In this method, the investigator 
collects the units according to his need and convenience. He does 
not use any specific technique in the collection of units. It is not 
necessary that entire characteristics of the population will come. into 
the sample constituted by the units so selected. It is likely to suffer 
from the drawback of biasedness of the investigator. It is not a 
scientific method so it is said to be opportunist method. This method 
is useful in those situations where the population might not be 
defined clearly. 

(iv) Area sampling : In this method the samples are collected on 
the basis of area. This technique is often used in the studies related 
to the population. In this method the population is divided on the 
basis of area. Some areas from those divided areas are chosen by 
random sampling. In this way the sampling unit is ‘area’ in this 
method and those units about which the data are collected are said 
to be primary units. Expenditure is rather less in this technique but if 
the number of primary units is less then the conclusion may be 
illusive. 

(v) Sequential sampling : In sequential sampling the size of 
sample is not determined in advance. Various samples may be taken 
in this method. After checking each sample it is decided whether that 
sample is to be accepted or rejected. Thus after the inspection of 


various samples, a last decision is attained. This method is often 
used to accept or reject a ‘lot’ for inspection of variety of commodity. 

(vi) Self-selected sample : Under this method investigator does 
not decide the units of sample but the units are authorized that they 
have to decide whether they are to be chosen or not as a sample 
unit. This method is generally used to find the T.R.P. of a programme 
on T.V. and Radio. For instance, if the opinion is solicited on a T.V. 
serial and it is requested to send the opinion on a particular address 
through website or e-mail then the person sends such opinion by 
self-inspiration. He is not selected by the investigator. That is why 
this method is said to be self-selected sampling. 

(vii) Line sampling : This method is mainly used in agricultural 
related surveys. Some points are selected in it by random sampling 
on map then at random lines are drawn in various directions by 
joining these points and required data are collected from the fields 
lying near those lines. 


SIZE OF SAMPLE 


The number of sampling units selected from the population for the 
purpose of investigation is known as sample size. To decide the 
proper size of sample is also a very big challenge to achieve the 
accurate conclusions in the enquiry. If the sample is very small, it 
may be that this sample can’t represent all the characteristics of the 
population properly and if it is larger than the need, time, labour and 
money will be misused in it and apart from it the management of 
large organization will be a big problem. So the sample size should 
neither be large nor too small. According to Parten, “An optimum 
sample survey size is one which fulfils the requirements of efficiency, 
representativeness, reliability and flexibility.” 


FACTORS AFFECTING THE SIZE OF SAMPLE 


Size of sample selected in an investigation depends on the 
following factors : 


(1) Size of universe : The large the size of universe, the bigger 
should be sample size in the same ratio. 

(2 ) Homogeneity or Heterogeneity of the universe : If all the 
units of the population are homogeneous, a small sample may serve 
the purpose but if universe consists of heterogeneous units a large 
sample may be inevitable. 

(3) Size of questionnaire : If the questionnaire used in the 
investigation contains large number of questions and the questions 
are difficult in nature, the size of sample should be kept small. 

(4) Number of classes proposed : If the number of classes 
proposed in the investigation is large, the sample size will be kept 
large so that all the classes can be represented properly. 

(5) Degree of accuracy : The greater the degree of accuracy 
desired the larger will be the sample size so that all classes can get 
proper representation. 

(6) Nature of study : Nature of study also affects the sample size. 
If the units are to be studied extensively and continuously, a small 
sample may be suitable. Here it is important that the sample size 
should be kept small for technical studies while for general survey 
the size of sample should be kept large. 

(7) Availability of time and resources : Sample size depends on 
the available resources for the investigation, /.e., labour, money and 
required time. If the investigator has vast resources of money and 
time, a large sample size should be taken and if he has scarcity of 
money and time, the sample size should be kept small. 

(8) Nature of unit : If the sample units are scattered far away in 
vast geographical area, the sample size should be kept small and if it 


is expected that the most of the units will not respond, sample size 
should be taken large. 

Calculation of sample size : Prof. Parten has derived the 
following formula to calculate sample size : 
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n = Feecrey 


where, n = Small size, 

o = Standard deviation of the population 

Precision = the degree of accuracy desired 

z = Value at a specific confidence level of desired degree of 
precision 

If we take 95% confidence level, the z = 1.96 if we take 99% 
confidence level, then z = 2.58. 

Example : If the standard deviation of the population is * 1,200 
and precision required is 
~ 100 then sample size at 95% confidence level will be, 


(1.96x1,200\7 


n= 1000 Cd} 553 


i.e., sample will consist of 553 units. 
When population is infinite or unlimited : 


n= (2) 

where n = Sample size 

oO = Standard deviation of the population 

d = the differences between population mean and sample mean 

z = Value at a specific confidence level. It will be 1.96 at 95% 
confidence level 

and 2.58 at 99% confidence level 

Example : If standard deviation of the population is 12, population 
mean is 40, sample mean is 37 then sample size at 95% confidence 
level will be, 


peerey 


n=\0-37! = 61 units 
Finite Universe : 
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N= 2°N 

where N = Number of total units of population 

Example : If standard deviation of the population is 6, population 
mean is 22, sample mean is 20, confidence level is 99% and the 
number of total units in the population is 500 then sample size will 
be, 


22 62 4 36 
N = 257587500 2.5758 "500 


36 
we = 28-16 or 22 unit 
— 1.653+0.072 


TEST OF RELIABILITY OF SAMPLE 


If sample does not represent all characteristics of the population, 
conclusion obtained by its study will be false and illusive. Hence 
before the investigation, it is essential to decide whether sample 
represents the entire population or not. For it, the reliability of sample 
can be tested in the following ways : 

(i) Dividing the selected sample in two equal parts, their 
characteristics will be compared to each other. If results are similar, 
sample will be reliable. 

(ii) In this method of sample testing, an other sample of same size 
is selected from the population by same method and their results are 
compared. If both results are similar then sample will be reliable. 

(iv) If the measures of universe can be determined then the 
measures of population will be compared with the measures of 
sample. If both measures are similar, sample will be reliable. 


PREPARATION OF QUESTIONAIRE 


Generally questionnaire is used to collect the primary data for the 
investigation. Statistical informations and news are obtained by filling 
questionnaire by investigator. The success of investigation and data 
collection by questionnaire depends on goodness of questionnaire. 


Questionnaire should be such that the unbiased informations 
could be obtained with sufficient accuracy. In the collection of 
primary data the schedule is also user along with questionnaire. 
There is a list of various questions in both questionnaire and 
schedule. The only difference between two is that schedule is filled 
by enumerator by asking the question from respondent while 
questionnaire is filled by respondent himself. Schedule and 
questionnaire can’t be differentiated distinctly with the view of 
drafting. Since questionnaire is filled by respondent (i.e. , the person 
from which information is collected) and it is mostly sent by post so 
proper notes are included along with questions so that respondent 
can get help to understand the questions. 

Merits of Good Questionnaire/Essential Points 
before Framing the Questionnaire 

A good questionnaire means a questionnaire which can obtain all 
required information of a statistical investigation without attachment 
and malevolence, i.e., without biasedness. Following are the 
essential points of a good questionnaire : 

(1) Least possible questions : The least number of questions 
should be included in the questionnaire as far as possible. But it 
should be kept in mind that questions should not be so less that the 
important questions having essential informations can’t get place in 
questionnaire. 

(2) Easy and clear question : The questions included in the 
questionnaire should be easy to understand and the meaning of 
asked questions should be clear. The questions should not be 
lengthy, complicated and ambiguous. If the respondent understand 
the question clearly, he will answer correctly. Easy language should 


be used in the questionnaire. As far as possible use of obsolete, 
complicated and disrespectful words should be avoided. 

(3) Briefness : As for as possible, size of questions should be 
small. Brief questions are easily understandable and to answer them 
is also easy. 

(4) Nature of Questions : In questionnaire, generally, four types 
of questions are 
included : 

(i) Alternative Questions : These are the questions whose 
answers are given in ‘Yes’ or ‘No’. Use of this type of questions are 
supposed to be excellent. The answer of the questions, ‘Do you 
have your own house ?’ ‘Do you do service ?’ ‘Are you married ?’ 
‘Do you have T.V. ?’, etc. can be given in ‘Yes’ or ‘No’. 

(ii) Multiple Choice Questions : There may be multiple answers 
of these questions. Investigator writes a set of possible answers 
against these types of questions. Respondent has to tick the 


appropriate answer. For instance, 
(A) What is your annual income ? 
a) Less than * 5,000 [ ] 
b) > 5,000 to * 20,000 [ ] 
c) * 20,000 to ~ 1,00,000 [ ] 
d) More than ~ 1,00,000 [ ] 
B) Which vehicle do you have ? 
a) Cycle [ ] 
b) Motor cycle [ ] 
c) Scooter [ ] 
d) Car [] 
ili) Specific Information Questions : These questions are asked 
to obtain some specific type of information. For example, ‘what is 
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your data of birth ?’, ‘What is the name of your 
wife ?’, etc. 

(iv) Open Questions : Views of respondent are known through 
these type of questions. Some places are left vacant for the answer 
of these questions where respondent writes himself his view. No 
alternative is given to him for it. For example, ‘What is your opinion 
about child marriage ?’ ‘What will be the proper solution of terrorism 
?’ ‘Upto what extent are you satisfied with Income Tax System ?’ etc. 

(5) Sequence of Questions : Sequence of questions has a 
special importance in a good questionnaire. Questions should be 
arranged in such a way that the sequence of thoughts should remain 
in the mind of respondent and he should not feel any mental 
pressure in answering one question to other question and he should 
feel more at ease in answering. For this, dispute less and easy 
question, e.g., your name, your age etc. should be asked in 
beginning and after that questions should be arranged according to 
topic. 

(6) Undesirable Questions : Such questions should not be 
included in the questionnaire which hurt self-respect of anybody, his 
social and religious sentiment. Also those questions due to which 
mental-excitement, doubt, opposite feeling may not develop, should 
not be included in the questionnaire. For example, ‘Do you smoke ?’ 
‘Are you suffering from AIDS ?’ ‘Are you faithful to your religion upto 
obduracy extent ?’ 

(7) Direct Related : Those questions which are directly related to 
investigation should be included in the questionnaire otherwise time 
and money will be misused in useless questions and desired data 
could not be obtained. 


(8) Test of correctness : Such questions should be also included 
in the questionnaire whose answer’s correctness can be tested 
mutually e.g. the data regarding expenditure and saving of person 
can be tested mutually. Test of correctness of the data of income and 
expenditure with saving is possible. 

(9) Knowledge of answer of the question : Those questions 
should be placed in the questionnaire whose answer is known to the 
respondent. Such questions that respondent docs not Know or he 
has to recollect his memory to give the answer, should not be asked. 
For example, a common man can’t be expected to know about the 
cause of heart attack, Garner Vs Murrey rule or Marshall money rule. 

(10) No Mathematical calculation : Such questions should not 
be included which requires mathematical calculation for answer 
because a common man can do mistake in mathematical calculation. 

(11) Direction : There should be a brief and clear cut direction to 
fill the questionnaire otherwise respondent might face difficulties. 

(12) Setting of Questionnaire : A good questionnaire should be 
presented in an attractive form. Sufficient space should be left for the 
respondent to write the answer. Following points are essential to 
keep in mind for a well framed questionnaire : 

(i) Quality of paper : The paper used for questionnaire should be 
smooth and of fine quality to write easily and it should not be cut or 
torn in folding. 

(ii) Margin : It is essential to leave the sufficient margin on the left 
side of the questionnaire. Questionnaire looks attractive by it. Also at 
the time of filing doubt does not remain that important information will 
be hid in the binding. 


(iii) Space for answer : It is essential to leave the proper space 
between the questions so that the answer of the questions may not 
be mixed up. Also it is essential to leave the sufficient space 
according to the need of required answer of the questions. 

(13) Pre-testing and correction : After preparing the 
questionnaire, it should be tested by filling it by some respondents so 
that the sufficient data for investigation are being obtained or not. If it 
is essential, questions should be also corrected and magnified. 

(14) Selection of the method of Tabulation : The tabulation 
method should be decided in the beginning. Actually data should be 
collected according to tabulation so that it may not be inconvenient in 
data analysis after their collection. 

(15) Covering letter : A covering letter with the questionnaire 
should be sent to the respondents giving brief information about the 
investigation and requesting them that the informations given by 
them should be kept confidential. A self-addressed and stamped 
envelope should also be enclosed for respondent’s reply in returning. 

CONSTRUCTION OF QUESTIONS IN 
QUESTIONNAIRE 


Investigator is expected to take sufficient precautions in preparing 
questionnaire. Questions included in it should be according to the 
topic of the investigation and aforesaid precautions should be kept in 
mind in its preparation. The questions of the questionnaire can be 
categorized into three main parts : 

(1) Introductory (2) Main Body (3) Conclusive 

They can be explained in details as follows : . 


(1) Introductory part : It is the beginning part of the 
questionnaire. Title of the questionnaire and the questions related to 
respondent’s introduction, for instant, his name, address, date of 
birth etc. are placed in it. 


(2) Main Body : It is the middle part of the questionnaire. All 
questions related to the topic are included in it. It is essential to 
define the topic clearly before the selection of the question. Topic 
should be analysed after understanding it clearly. The questions can 
be divided into two parts to obtain the required answer : direct 
questions and indirect questions. 

Direct questions are directly related to the topic. These are clear 
and easy. The respondent can answer them easily. For example, 
‘what is the means of your livelihood ?’ or ‘Which vehicles do you 
have ?’, etc. on the other hand, indirect questions are the questions 
about which respondent avoids to answer or does not want to give 
direct information. In such situation, the effort is made to get the 
required information by asking some indirect questions. For the 
information about the income of a person, the indirect questions 
asked are : ‘Do you file income tax return?’ ‘What is the estimated 
annual sale of your business?’, How much sale tax and how much 
income tax have you paid in past years ?’ etc. 

Conclusive part : It is the last part of the questionnaire. Generally 
in this part, the idea and suggestion are solicited from the 
respondent about the related topic. For example, ‘What more 
changes can be done in village employment scheme according to 
your idea ?’ or ‘What do you expect from the government for the 
employment ?’, etc. 

Specimen of Questionnaire 

Que. 1. Name and address of self assistance group : 

Que. 2. Date of construction and number of members : 

Que. 3. Bank branch where group’s account is open : 

Que. 4. Aim of group construction : 


Que. 5. Related to the foundation of group construction : 
1. Who gave the inspiration : 


(a) Development officer 
(b) Inspirer 

(c) Aganwari worker 

( 

( 


2. What was starting amount per member (a) ~ 30 (b) ~ 50 (c) ° 
100. 


3. In present, saving per member— 
4. Total saving of group 

1. Principal saving of group 

2. Interest gained to group 


Total Amount .............. 

Que. 6. Age of group : (a) Less than 6 months (b) 6 months to 1 
year (c) 1 to 3 years (d) 3 to 5 years (e) more than 5 years. 

Que. 7. Grading situation : 

1. Passed 1st grading— Yes/No 

If yes then circular fund gained : 

(a) From district panchayat ...............cccccceeeeeees 

(b) From Bank ...........cccceeeeeeeeeeeees 

2. Passed 2nd grading : Yes/No, if yes, obtained loan : Yes/No 

Activity | chosen Yes/No, Name of activity 

Loan limit.............cccceceeee ees SIAN sesadavawet daaderdeckes 

Que. 8. Training got for group conduction : Yes/No 

Que. 9. Description of current activities .................. 

1. 

2. 

oF 

Que. 10. Condition of Loan Refund : (a) Regular (b) Inregular (c) 
Total Refunded. 

Que. 11. Current income per month per member 


(a) Less than 2,000 (b) * 2,000 to © 3,000 (c) 3,000 to ~ 5,000 (d) 
More them 


* 5,000. 

Que. 12. Change in social condition : 

Previous Present From where 

got inspiration 

. Literacy 

. Lavatory 

. Family planning 

. Education of children 

. Immunization of children 

. Child marriage 

. Leadership capacity 

. Alertness against social-evil practice 

Que. 13. Problems from beginning of group to till flow : 

1. Individual problem of group : 

2. Problem from bank : 

3. Problem from office : 

4. Other problem : 

Que. 14. Name of main business activity and starting date : 

Que. 15. Market Management— 

1. By self : —(a) Fair (b) Haat Bazar (c) Sale door to door 

2. By Government : (a) Purchasing goods from home 

(b) Organizing Fair/exhibition (c) Sale out of state 

Que. 16. Problem in sale : 

1. Lack of Quality : Yes /No 

2. Lack of Technical Education : Yes/No 

3. Availability of market : Yes/No 

4. Lack of produced goods : Yes/No 

5. Co-operation of Government : Yes/No 

6. Problem of Transport : Yes/No 

Que. 17. If there is slack of produced goods, its cause : 

1. Lack of Capital : Yes/No 

2. Lack of Water/Electricity : Yes/No 

3. Lack of Trained Labour : Yes/No 

Que. 18. Your suggestion for successful implementation of Golden 
Jubilee village selfemployment scheme : 

(a) Group construction/conduction : 
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b) Government level : 

c) Bank level : 

d) Local government level : 
e) Others : 


THEORETICAL QUESTIONS 


Long Answer Questions 


( 
( 
( 
( 


1. What is meant by random sampling ? State the merits and demerits of it. 

2. What do you understand by random sampling ? Explain various methods of it. 

3. Write short notes on the following : (i) Deliberate sampling, (ii) Random 
sampling 

4. What is meant by size of sample ? State the factors affecting the size of 
sample. 

5. What do you understand by size of sample ? Explain it calculation methods. 
How can we test the reliability of sample ? 

6. What is questionnaire ? What are the essentials of a good questionnaire ? 
What points should be taken into consideration while selecting the enumerator ? 

7. Give the qualities of a good questionnaire ? 

8. Draft a questionnaire containing 18 questions for the study of economic 
conditions and habits of students of your college hostel. 


9. Distinguish between Questionnaire and Schedule. (Bhopal 2009) 
Short Answer Questions 


. What is meant by sampling ? 

. Write the essentials of sampling. 

. What do you understand by random sampling ? 
. What is Deliberate sampling ? 


. State difference between deliberate sampling and random sampling. 
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. Write short note on any one from the following : 
i) Multistate sampling (ii) Multistage sampling (iii) Multiphase sampling 
iv) Cluster sampling (v) Extensive sampling (vi) Quota sampling 


—_~— 


(vii) Area sampling (viii) Line sampling 


OBJECTIVE QUESTIONS 


Choose the correct answers : 


1. Merit of a good questionnaire is : 
(a) Proper sequence (b) Proper place 
(c) Questions short and clear (d) All of the above 


2. Which quality of a question should not be in a questionnaire ? 
(a) Simplicity (6) Personal (c) Clarity (d) Proper sequence 
3. The greatest demerit of a questionnaire is that persons : 


(a) return it (6) do not return it 
(c) do not answer the questions (d) do not go through it carefully 


4. The main drawback of schedule system is that it is : 
(a) less expensive (b) expensive (c) more expensive (d) not expensive 


[Ans. 1. (d), 2. (b), 3. (b), 4. (c)] 


Methods of Sampling | 
In this method all units of sample are selected on the will of 
investigator. 
In this method the will of investigator is above all in the selection of 
unit so doubt about biasedness remains in the sample selection. 
In this method all units of the population do not have the equal 
chance of being chosen in the sample. 
In this method estimation of sampling error is not possible. 
This method of sampling is suitable for such investigation where all 
units are almost the same and it is essential to include some units in 
the sample. 
In this method units are selected at random. 
In this method no unit is selected in the sample by the will of 
investigator so there is no chance of biasedness. 
In this method each unit of the population has equal opportunity of 
being selected in the sample. 
In this method sampling error may be estimated on the basis of 
probability theory. 


This method remains suitable in all other areas. 


Classification & Tabulation of Data | 
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CLASSIFICATION & 
TABULATION OF DATA 


* Classification of Data 

* Objectives of Classification 

* Essential Elements for Ideal Classification 

° Types of Classification 

° Tabulation of Data 

* Objects, Objects of Tabulation 

° Difference between Classification and Tabulation 

* Main Parts of Table and General Rules of Tabulation 
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* Presentation of Textual Data in Tabular Form 

When the data are collected under any research assignment or 
enquiry, they will be in a raw state and will not be suitable for any 
statistical treatment. It is essential that data should be summarized. 
This work may be done by classification and tabulation. 

Classification is the process of arranging data collected under any 
investigation in group or classes according to their characteristics or 
attributes. Thus classification is only the first step in the 
summarization of the data. The classification of the data is 
equivalent to the distribution of letters into different late in the post 
office. As the postman distributes the letters into different regions by 
putting the letters of different regions into different bags, in the same 
way Classification of the data is also done by putting them into 
different classes according to their characteristics and attributes. For 
example human population can be divided into two groups of males 
and females, into different groups according to their age. In a similar 
way the students of a university can be divided into three different 


groups of I, Il and III division according to their marks obtained in the 
examination. When the data classified into different groups are put 
into rows and columns, it is known as Tabulation. So tabulation 
prepares the base for tabulation. 


CLASSIFICATION OF DATA 


Definition of classification : The following are some main 
definitions of the classification of the data : 

According to Connor, “Classification is the process of arranging 
things in groups or classes according to their resemblances and 
affinities.” 

To quote Secrist, “Classification is the process of arranging data 
into sequences and groups’ according to their common 
characteristics.” 


Characteristics of the classification : The following are the 
characteristics of the classification according to its definition : 

(1) The collected data are arranged into various groups under 
Classification. 

(2) Classification of data is done on the basis of their 
characteristics or attributes. 

(3) Classification of the data may be done in the real or imaginary 
form. 

(4) One may see unity in diversity of the data. 


OBJECTIVE OF CLASSIFICATION 

The main objectives of the classification of the data are : 

(1) To express the similarity and diversity of data : Similarity 
and diversity of the statistical facts are explained by the classification 
of data. The data having the same characteristics or attributes are 
put together. For example the classification of educated-uneducated, 
failed-passed, married-unmarried and  employed-unemployed 
persons. 


(2) To be helpful in comparison : It becomes easy to compare 
the data by classification. For example, if the marks of students of 
B.A. of two colleges are given separately, it will be very difficult to 
compare their intelligence level. But if the students of the colleges 
are Classified into three groups of |, II and III division on the basis of 
their marks, it will become very easy after comparison to know the 
student of which college has good level of intelligence. 

(3) To make data simple and brief : The main task of 
Classification of data is to eliminate irrelevant details and make them 
simple and brief. It helps in forming a mental picture of the data. It 
also saves the mind from unnecessary labour. 

(4) To arrange logically : Classification is logical process. Data 
are arranged into different groups on the basis of their original 
characteristics. Data become easy, clear and understandable due to 
the classification. Logically data become more scientific and 
systematic after classification. 

(5) To prepare base of the tabulation : Classification provides 
base for tabulation and statistical analysis of the data. 


ESSENTIAL ELEMENTS FOR GOOD 
CLASSIFICATION 


Following are the essential elements for good classification : 

(1) Exhaustiveness : Classification should be so exhaustive that 
each and every unit should be included in anyone of the classes. If 
some values are left, it means classification is not made properly. 

(2) Clarity and doubtless : Formation of the classes should be 
clear, simple and definite. It should be decided which value will be 
put in which class. 


(3) Stability : The base of the classification should be stable from 
alpha to omega. It may be difficult to compare the data and draw the 
conclusion if we change the base of classification repeatedly. 

(4) Suitability : Classification should be according to the purpose 
of the investigation. For example, if we have to compare the results 
of two colleges then the classification of students should be done on 
the basis of class, number of students in the class and number of 
passed students. 

(5 ) Homogeneity : The units of a particular class should be 
according to that attribute on the basis of which classification is 
done. 

(6) Flexibility : Formation of the classes should be done in such a 
way that changes may be done according to need. There should be 
proper flexibility in the construction of the classes. 

(7) Mathematical accuracy : Total of the items included in the 
different classes should be equal to the grand total. Thus 
mathematical accuracy is also essential for the classification. 


TYPES OF CLASSIFICATION 


Classification may be broadly divided into two categories : 
(1) Classification according to attributes 
(2) Classification according to class interval 


1. Classification according to Attributes 

In this type of classification the collected data are classified on the 
basis of some attributes or Quality e.g. sex, education, literacy, 
religion etc. The population of a country can be divided into two 
groups-male and female or educated and uneducated or Hindu and 
non Hindu etc. Here Classification is made on the basis of attributes, 
groups differentiated by the presence and absence of the attributes. 


If the data are classified on the basis of one attribute only, the 


process is known on simple Classification. 

In case, where more than one attribute is studied, resulting in a 
sub-division of classes, the classification is known as manifold. Thus, 
the population of a country may be divided into male and female. 
Males may be again divided into literate and illiterate. 


Population 
Male Female 
Literate Illiterate Literate Illiterate 


2. Classification according to Class Interval 

When the items are expressed numerically, such as age, weight, 
height etc. the classification is done by making class intervals. Here 
the limits are taken arbitrarily. The ends of the classes are known as 
class limits. For example, in class 10-20, 10 is the loser limit and 12 
is the upper limit. The difference between upper and lower limits is 
known as width of the class. Now each item is recorded against the 
class in which it will fall. The number of observations in each class is 
known as frequency of that class. Thus the distribution of the whole 
data over the class intervals is Known as frequency distribution. 


TABULATION OF DATA 


The following are the main definitions of the tabulation : 

According to Blair, “Tabulation in its broadest sense is any orderly 
arrangement of data in columns and rows. According to Connor, 
“Tabulation involves the orderly and systematic presentation of 
numerical data in a form designed to elucidate the problem under 
consideration.” 

Tuttle stated, “Tabulation is the logical listing of related quantitative 
data in vertical columns and horizontal rows of number with sufficient 
explanation and qualifying words, phrases and statements in the 
forms of titles and headings and explanatory notes to make clear the 
full meaning, context and the origin of the data.” Thus the statistical 
table is a logical and systematic organisation of data in columns and 
rows. 


OBJECTS OF TABULATION 


The main objects of the tabulation are : 

(1) To represent the data systematically. 

(2) To represent the data in brief so that they can be easily 
understood and leave a lasting impression. 

(3) To represent the data in such a manner that the problem 
become easy and clear. 
(4) To display the data under different columns and rows so that their 
comparison can be done. 


IMPORTANCE OF TABULATION 


Tabulation is a common link between classification and 
interpretation of the data. Without tabulation no statistical treatment 
of a problem is possible. Main advantages of the tabulation are : 

(1) It simplifies the complex data. 

2) It facilitate quick comparisons. 

3) It economics space and time. 

4) It helps in statistical interpretation. 
5) It gives an identify to the data. 

(6) It is an essential step to presentation, whether data are 
presented informally, in tabular form or graphically and 
diagrammatically. 


DIFFERENCE BETWEEN CLASSIFICATION AND 
TABULATION 


Classification and tabulation both are important statistical divides. 
They are used to make the collected data easily understandable. But 
they differ in the following ways : 

(1) Both have different order Classification of the data is the first 
step in tabulation. Before the data are put in tabular form they have 
to be classified. Thus classification prepares the base for tabulation. 

(2) Tabulation is the mechanical part of classification. 

(3) Classification is a device of analysis in statistics while 
tabulation is a process to represent the data. 

(4) Data are divided into classes and sub classes in the 
Classification while data are presented in title and caption in the 
tabulation. Data may be presented in the form of percentage, ratio 
etc. in the tabulation. 
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MAIN PARTS OF TABLE AND GENERAL RULES 
OF TABULATION 


To prepare a first class table is an art. In preparing a table 
attention should be given to the following points : 


(1) Table number : Each table should be numbered. The number 
may be given at the top of the table. Number of the table may be 
used as the reference to it for future purpose. 


(2) Title : A good title explains in brief and concise language. 

(a) What the data are, 

(b) Where the data are, 

(c) The classification principle and 

(d) Time period of the data it is usually advisable to prepare the 
table first and write the title after words. 


(3) Caption : The titles of the columns are called caption. The 
wording of the heading should be brief. The headings for the 
columns should be in singular form as : year, country etc. caption 
should be clearly defined and placed at the middle of the column. 

(4) Stubs : The titles of the horizontal rows are called stubs. The 
box over the stub on the left of the table should give description of 
stub contents. The stubs are usually wider than caption. 

(5) Main body of table : It is the most important part of table. It 
contains the numerical information. Data are recorded into columns 
and rows according to captions and stubs. 

(6) Ruling and spacing : The table drawn should be attractive, 
impressive and clear. Ruling should be clear and spacing should be 
proper. The frame of table should be properly drawn. 

(7) Footnotes : It may happen that without footnotes the data 
may seem to tell a story which is quite different from the actual facts. 
For example, if we look into a table giving yearly figures of wheat 


production in India, the sudden fall in the figure for 1947 would be 
misleading unless there is a footnote to point out that the figures for 
1947 relate to India after partition. So whenever needed, footnotes 
should be noted at the bottom of the table. 

(8) Source of data : A eighth requirement is about the source of 
the data. It must be mentioned in the footnotes. 

(9) Size of table : Size of table should not be either very big or 
very small. Size depends upon the purpose and place. If the place is 
small for a table then it should be divided into a number of tables and 
a summary table should be prepared at last. 

(10) Total and sub-totals : A statistical table must contain sub- 
totals for each separate column and row and a grand total for all. 

(11) Unit of measurements : The unit of measurement should be 
clearly defined and given in table such as weight in kg and income in 
rupees. 

(12) Not available figures : Do not use zero to indicate a 
information which is not available. If it is not available, show this fact 
by a dash (—) or by the letters N.A. (Not Available). 

(13) Approximation function : If figures are rounded to avoid 
unnecessary details in the table, a footnote to this effect should be 
given. 

(14) Miscellaneous column : A miscellaneous column should be 
added for the date which do not fit in the classification made. 

(15) A void abbreviation : Abbreviation should be avoided. For 


example, ‘yr’ should not be used for ‘year’. 

Actually all these rules are in the form of guidelines. These should 
be used on the basis of common sense and experience. Dr. Bowley 
rightly stated, “In collection and tabulation of data, common sense is 
the chief requisite and experience the chief teacher.”. 


A Form of Table 
Table Title 
Head note (unit of measurement) 
Caption (Headings of Vertical Columns) 
Stub-Box Column Head Column Head Column Head Total 
Row-Head 
Row-Head 
Total 
Foot-note 
Source : 
KINDS OF TABULATION 
Tabulation may be of the following types : 
(1) Simple table : In a simple table only one characteristics or 
attribute is shown. It is the simplest tabulation. For example, the 
number of employees in Indian Overseas Bank according to their 


age is shown in the following table : 
Age (in years) No. of Employees 
Below 25 ... 
25-35 ... 
35-45 ... 
45-55 ... 
Above 55 ... 
Total ... 


(2) Double table : Here the data are classified according to two 
characteristics or attributes. The number of employees in Bank can 
be divided according to their age and sex as follows : 

Age No. of Employees 

(in years) Male Female Total 

Below 25... ... ... 


(3) Manifold table : In this type of tabulation the data are 
arranged in rows and columns according to more than two 
characteristics or attributes. The number or employees in bank may 


be distributed according to their age, sex and post as follows : 
Rank 


Age Clerk Assistant Officer Total Grand 


(in years) MF MF MF MF Total 
BelOW 2S se esinteets¥e san Teer os Sa 
DODO Pie ed aay ead eee sila at 
SOHO Ol xine Noauitt ain Galt wate madeenaca 
DOPOD acivin Sanivaatiade dau ubitves 

PDOV OOO sexi tn, Gia st.oty, anne ae eee 
TOUAl cos sia vex oS eee 


Note : M= Male F Seevele 
IMPORTANT EXAMPLES 


Illustration 1. 

The quantity and price of jute exported by India to England, 
America, Russia, Japan and Canada from 2008 to 2011 are given. 
Prepare a suitable blank table to represent them. 

Solution : 


(Jute exported by India to different countries from 1978 to 1981) : 
Quantity and Price 
(in ton) (in ~ ) 
Country 2008 2009 2010 2011 Total 
Qnt. Price Qnt. Price Qnt. Price Qnt. Price Qnt. Price 
England 
America 
Russia 
Japan 
Canada 
Total 


Source : 


PRESENTATION OF TEXTUAL DATA IN TABULAR 
FORM 


Information given in the textual form can be represented in the 
tabular form. 

To write these informations in tabular form, first of all read these 
informations carefully; then decide the basis of the classification and 
the classes (or groups) etc : next make a blank table correspoinding 
to the given classification, write the given data in the respective cells. 
Here two cases may be arised : 


(i) No cell is blank (ii) S ome cells remian blank. 

If some of the cells remian blank, find out the data to complete the 
table by mathematical manipulations. If figures are not given directly, 
transform them into suitable form. 


Illustration 2. 


In a survey about coffee habits in two towns, the following 
information is given : 


Town X : Females were 40%, total coffee drinkers were 45% and 
male non-coffee drinkers were 20%. 
Town Y : Males were 55%, male non-coffee drinkers were 30% 
and female coffee drinkers were 15%. 
Solution : 


Since the date are given in the percentage, we can assume that 
the number of persons in each town is 100. Then 

For Town X : No. of Females = 40 

Total coffee drinkers = 45 

Males non coffee drinkers = 20 

For Town Y : No. of males = 55 

Males non coffee drinkers = 30 

Females coffee drinkers = 15 

Here classification is done according to two attributes : Sex and 
Coffee habits. Given informations can be presented in the following 
table : 


Persons Town X Town Y 

Male Female Total Male Female Total 
Coffee drinkers — — 45 — 15 — 

Non coffee drinkers 20 — — 30 — — 
Total — 40 100 55 — 100 


Process to complete the table : 
For town X 


Male = 100 — 40 = 60 

No. of males coffee drinkers = 60 — 20 = 40 

No. of females coffee drinkers = 45 —- 40 =5 

No. of females non coffee drinkers = 40 — 5 = 35 


No. of persons non coffee drinkers = 100 — 45 = 55 
For town Y : 
No. of males coffee drinkers = 55 — 30 = 25 


No. of females = 100 — 55 = 45 
No. of female non-coffee drinkers = 45 — 15 = 30 
No. of persons coffee drinkers = 25 + 15 = 40 


No. of persons non-coffee drinkers = 100 — 40 = 60 


Thus complete required table is as follows (The figures obtained 
by mathematical manipulation are given in the parenthesis) : 


Person Town X Town Y 

Male Female Total Male Female Total 

Coffee drinkers (40) (5) 45 (25) 15 (40) 

Non-coffee drinkers 20 (35) (55) 30 (30) (60 

Total (60) 40 100 55 (45) 100 
Illustration 3. 

Tabulate the following information : “In a trip organised by a 
college, there were 80 persons each of whom paid * 15.50 on 
average. There were 60 students each of whom paid ~ 16. Members 


of the teaching staff were charged at a higher rate. The number of 
servants was 6 (all males) and they were not charged any thing. The 
number of ladies was 20% of the total of which one was a lady staff 
member.” 


Solution : 

Category Number of Tourist Contribution Total contri- of 
tourist bution 

Male Female Total Per Head ( in ~ ) ( in~ ) 

Teacher 13 1 14 20 280 

Student 45 15 60 16 960 

Servant 6 — 6 0 — 


Total 64 16 80 — 1,240 
Method : 


Average payment by 80 persons = * 15.50 
Total payment by 80 persons = 80 x 15.50 =~ 1, 240 
Payment by 60 students = 60 x 16 = * 960 
Payment by teachers = 1, 200 — 960 = ~ 280 
Payment by per teacher = = =~* 20 
THEORETICAL QUESTIONS 
Long Answer Questions 
1. Distinguish between classification and tabulation. 
2. What are the different parts of a table ? Explain the main precautions you will 
take in tabulating your data ? 
3. Prepare a blank table in which can be show the prices per quintal of wheat and 
rice for the year 2010 and 2011 for 7 important grain markets of your state. 
4. There are 1, 440 employees in a certain company. Of all the employees one in 


the three is a woman and one in twelve is a married women. One is six of the 


man is married man. 
Tabulate the above data in complete figures. 


[ Ans. Distribution of workers in a certain company according to sex and marital 
status 
Merital Status Number of Employees 
Men Women Total 
Married 160 120 280 


Unmarried 800 360 1,160 
Total 960 480 1,440 


5. A picnic was organised by a college in which 100 persons took part. Out of 


them 75 were students and the rest were teachers; of the students 25 were 
female students and of the teachers 10 were ladies. Tabulate this information 
showing the number of males. 

[ Ans. 


Sex Men Women Total 
Category 


Teacher (15) 10 (25) 
Student (50) 25 75 
Total 65 35 100 


6. In a sample study about the smoking habits in two town following data were 
obtained : 


Town A Town B 


Males in total population 52% 54% 
Smokers 26% 28% 


Male smokers 18% 20% 
Tabulate the above data. 


[ Ans. 


Smoking habits in two town (in%) 
Persons Town A Town B 


Male Female Total Male Female Total 


Smokers 18 8 26 20 8 28 
Non-smokers 34 40 74 34 38 72 
Total 52 48 100 54 46 100 
7. Present the following information in a suitable tabular form supplying the 


figures not directly given : 

In 2001 out of a total 4,000 workers in a factory 3,300 were members of a trade 
union. The number of woman workers employed was 500 out of which 400 did 
not belong to any trade union. 

In 2000, the number of workers in the union was 3,450 of which 3,200 were men. 
The number of non-union workers was 760 of which 330 were women. 


[ Ans. 
Information of Workers in Year 2000 and 2001 acording to Gender and 
member of Trade Union 


Year — 2000 2001 


Sex — Male Female Total Male Female Total 
Workers 


Member of Trade Union 3,200 (250) 3,450 (3,200) 100 3,300 
Not Member of Trade (430) 330 760 (300) 400 700 
Union 


Total (3,630) (580) (4,210) (3,500) 500 4,000 


8. Present the following information in a suitable tabular form : In 1990 out of a 
total of 1,750 workers of a factory, 1,200 were members of a trade union. The 
number of women employees as 200 of which 175 did not belong to a trade 
union. In 1995 the number of union workers increased to 1, 580 of which 1,290 
were men. On the other hand, the number of non-union workers feel down to 
208 of which 180 were men. In 2000 there were 1,800 employees who 
belonged to a trade union and 50 who did not belong to a trade union. Of all the 
employees in 2000, 300 were women of whom only 8 did not belong to a trade 
union. 


[ Ans. 


Year — 1990 1995 2000 


Category — Men Women Total Men Women Total Men Women Total 
Category 

Member of Trade 1,175 25 1,200 1,290 290 1,580 1,508 292 1,800 

Union 

Not member of 375 175 550 180 28 208 42 8 50 

Trade Union 


Total 1,550 200 1,750 1, 470 318 1,788 1,550 300 1,850 


OBJECTIVE QUESTIONS 


Choose the correct answers 
1. The aim of tabulation is not : 
(a) systematic presentation (b) errors doing (c) comparative study (d) to tell about 
size 
2. The effect of data on mind throw tabulation is : 
(a) permanent (b) temporary (c) hypothetical (d) real 


3. Whose statement is this : 
“Tabulation, in its broadest sense, is an orderly arrangement of data in columns 
and rows.” 


(a) Bowley (b) Tippett (c) Fisher (d) Blair 


4. Which statement is true ? 

(a) Classification is not the basis of tabulation. 

(b) Tabulation is mechanical aspect of classification. 

(c) To keep the data in systematic form is not an objective of tabulation. 
(d) None of these. 
( 
( 


. Classification is the process of arranging data in : 
a) different rows (b) different columns 
Cc) different column and rows (d) grouping of related facts in different classes 


6. Classification of the students of a college ‘rural’ and ‘urban’ is : 
(a) Qualitative (b) Quantitative (c) Geographical (d) None of these 


7. Which one of the following is not a kind of statistical series : 
(a) Individual (b) Discrete (c) Classified (d) Signed 


8. When two types of infomation is obtained from a table then it is called : 
(a) simple table (b) double table 
(c) triple table (d) manifold or higher order table 


[Ans. 1. (b), 2. (b), 3. (d), 4. (b), 5. (d), 6. (a), 7. (d), 8. (b)J 
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PREPARATION OF 
STATISTICAL SERIES AND 
ITS TYPES 


— ° Variable 

— °* Frequency Distribution 

— ° Univariate Frequency Distribution 
— ° Bivariate Frequency Distribution 


VARIABLE 


The attributes or the characteristics that change in quantity or 
numerical values are called variables. For example, heights, weights, 
rainfall records, barometer readings etc. 

A variable may be (i) continuous or (ii) discrete. 


(i) Continuous variable : A variable which can take any 
numerical value within a certain range is called continuous variable. 
For example, weight, height, temperature, speed of car etc. 

(ii) Discontinuous variable : A variable which is incapable of 
taking all possible values is called discontinuous variable. For 
example, number of students, number of machines, number of 
rooms in a house etc. Generally continuous data are obtained by 
measuring while discontinuous data are obtained by counting. 


FREQUENCY DISTRIBUTION 


To arrange the collected data according to the frequency 
possessing the individual or grouped values of the variable is called 
‘frequency distribution’, that is, the tabular form of the collected data 


in which the values of the variable as well as their frequencies we 
given is called frequency distribution. 

Frequency distribution are of two types : 

(1) Discrete frequency distribution 

(2) Continuous or grouped frequency distribution 


(1) Discrete Frequency Distribution 

First of all the collected data are arranged in ascending or 
descending order of magnitude. It is commonly termed as array. At 
the time of making array some values are repeated many times then 
despite of writing them many times it will be better to make 
frequency array. In this array all the given values are recorded once 
only and their frequencies, that is the number of times they occur, 
are noted against them. Such type of arrangement is called discrete 
frequency distribution. 


Example : The number of mites per leaf measured on 75 leaves 
are below. Prepare frequency distribution : 

0,0, 1, 2, 4, 0, 3,0, 1, 2,5, 1, 1, 2, 3, 2, 1, 0, 0, 4, 0, 2, 1, 1, 1, 0, 

6, 1, 0, 4, 0, 6, 3, 2, 1, 0, 0, 0, 1, 1, 5, 4, 2, 1, 3, 0, O, 1, 1, 2, 2, 2, 

3,1, 0,0, 4, 3, 1,1, 0,0, 1,1, 2, 3,0, 0,2, 1,4, 2,1, 0,0 


Solution : Here the variable assumes the values O, 1, 2, 3, 4, 5 


and 6 only. So the frequency distribution will be as follows : 
No. of Mites per leaf Tally Marks Frequency 


Ownwm fl] 23 
1 www |] 22 
2» x ||] 13 
36 || 7 
4x|6 

5 || 2 

6 || 2 

Total 75 


(2) Continuous Frequency Distribution 


First of all we find the difference between the highest and lowest 
observations that is called the range of the data. We divide this 
range into a number of groups known as classes. The number of 
classes depends upon the range and total number of the 
observations. We write these classes in ascending order into a 
column. Then we put a mark in second column for each observation 
against the class in which that observation occurs. This mark is 
called ‘Tally marks’. After putting the tally marks for all observations 
we count the tally marks lying against each class and write this 
number in third column against that class. The systematic 
representation of this types is known as continuous frequency 
distribution. It can be prepared by two methods. 

(i) Exclusive method : In this method the limits of the classes are 
so fixed that upper limit of each class is equal to the lower limit of the 
next class e.g. 5-9, 9-13, 13-17,....etc. The item having the value 
equal to the upper limit of a class is put in the next class. For 
example, if the value of an item is 13, it will be put in 13-17 in spite of 
9-13. That is upper limit is not included in this type of classification. 

(ii) Inclusive method : In this method the limits of the classes are 
so fixed that the limits of two consecutive classes will not be 
common, e.g. 5-8, 9-12, 13-16, ....etc. The upper limit is included in 
this type of classification. For example, if the value of an item is 16, 
then that will be put in 13-16. 


Illustration 1. 


25 students obtained the following marks in a paper of Business 
Statistics having 50 marks. Form a grouped frequency distribution by 
taking class intervals of ten marks. 


Solution : 


Frequency table by exclusive method : 
Class Tally marks Frequency 


0-105 
10-20 » || 7 


20-30 » || 7 


30-40 ||| 4 
40-50 || 2 
Total 25 


Illustration 2. 


Form frequency distribution of the following 15 marks by inclusive 
method taking class interval of 2. 
5,10, 12, 15, 17, 21, 10, 19, 6, 14, 25, 13, 17, 21, 23 
Solution : 


Frequency distribution 
Class Tally Marks Frequency 


5-7 || 2 
8-10 || 2 
11-13 || 2 
14-16 || 2 
17-19 ||| 3 
20-22 || 2 
23-25 || 2 
Total 15 


Some technical terms, whose meaning are given below, are used 
in the classification according to class interval : 


(a) Class limits : Class intervals are fixed by two values which 
are called class limits. Each class has two limits : (i) lower limit and 
(ii) upper limit. For example, 10 is lower limit and 20 is upper limit of 
the class 10-20 in example 1. 

(b) Width of class interval : The difference between upper limit 
and lower limit of a class is called width of the class interval. In 
example 2, width of class interval 5 — 7 is (7 — 5 = 2). 


(c) Mid-value : The mid point of two limits of a class is called mid 
value that can be obtained by dividing the sum of two limits by two 


10+20 __ 
=15 
2 


example, the mid value of class 10—20 is 


PRECAUTIONS 

The following points should be considered at the time of forming a 
frequency distribution : 

(1) The number of classes should be between 5 and 15. Actually 
the number of classes is fixed by considering their composition, 
purpose of the investigation and the calculation work. Prof. H.A. 
Sturges proposed the following formula for it : 

n= 1+ 3.322 log N 

where, n = No. of classes 


N = No. of observations 

Suppose, the number of observation be 40. Then the No. of 
classes will be 

n= 1 + 3.222 (log 40) 

= 1+ 3.222 x 1.6021 

=1+5.316 = 6.316 or 6 

Thus six classes will be made. 

(2) Width of class interval : Width of class interval should be 
multiple of 5 e.g. 5, 10, 15, 20, 25, 100 etc. The calculation work 


becomes easy with it. According to Prof. H.A. Sturges : 


The hightest value of item — the lowest value of item 
+ 3.322 log NV 


Width of class interval = 
or/j= eer N 

(3) The lower limit of the first class should be either O or multiple of 
5. For example, the lowest value of the data is 7 and we have taken 
a width of class interval of 10, then the first class should be 0-10 
rather than 7-17. . 

(4) To ensure continuity in the class-interval the exclusive method 
should be used. If the classes are made by inclusive method, it 
should be changed into exclusive method. 


(5) Width of all class intervals should be the same. Due to it, 
comparison of the classes and calculation work become easy. 

(6) If the classes are irregular, they should be regularised by taking 
the largest width of class-interval as a base. 


CUMULATIVE FREQUENCY TABLE 


Sometimes we do not know only the number of observations of a 
class but we also want to know the number of observations less than 
or more than a particular limit. We add the frequencies for it. When 
the frequencies are added, they are called cumulative frequency. 
When these frequencies are classified, the cumulative frequency 
table is obtained. They are of two types : 

(i) less than (ii) more than. The cumulative frequency table of both 
types for example 1 are as follows : 

Variable Cumulative Variable Cumulative 

(X) frequency (X) frequency 

(less than) (more than) 

less than 10 5 more than 0 25 

less than 20 12 more than 10 20 

less than 30 19 more than 20 13 

less than 40 23 more than 30 6 

less than 50 25 more than 40 2 


UNIVARIATE FREQUENCY DISTRIBUTION 


When we collect data under any study and obtain various values 
of only one variable from there data, they are called ‘univariate data’. 
For example, the data related to the height of 50 students of a class. 
The frequency distribution prepared by these data is called 
univariate frequency distribution. 


Illustration 3. 


Explaining the method, prepare a discrete frequency distribution 
from the following data : 

8 12 15 18 12 15 20 22 15 25 

30 15 10 15 27 20 30 15 10 15 


Solution : 
Steps of constructing a Discrete Frequency Distribution : 


1. Prepare a table consisting of three columns. First column for 
variable, second column for tally marks and third column for 
frequency. 

2. Write all the values of variable in ascending or descending order 
from top to bottom. 


3. Read off all values of variable carefully one by one and draw a 
tally mark against each value. Draw one tally mark against one value 
and draw a fifth tally crossing the preceding the four ( « ). Make the 
tally further after some gaps. This process of making frequencies in 


the group of five is called ‘Four and Crossed’ Method. 
4. Count all the tally marks against each value of variable and 
write their number in the next column. 
5. Write down the total frequency in the last row at the bottom. 
Discrete Frequency Series 
Variable Tally Marks Frequency 


8 | 1 
10 || 2 
12 || 2 
15 «|| 7 
18 | 1 
20 || 2 
22| 1 
25| 1 
27| 1 
30 || 2 
Total — 20 


Illustration 4. 

Prepare a discrete frequency distribution from the following 
sentence in English : 

“Today there is hardly a phase of endeavour which does not find 
statistical devices at least occasionally useful.” 


Solution : Following data are obtained after writing the number of 


letters in each english word. 
5952615295 
43411725126 
The required discrete series is : 


No. of Letters ( x ) Tally Marks No. of Words Frequency (f) 


ON On RWHND =| 
Ol 


9} 1 
10 —0 
11 | 1 
12 | 1 
Total — 18 
Difference between Exclusive and Inclusive Methods 
Exclusive Method Inclusive Method 


1. The Upper limit of class is not included in that 
class but included in the next higher class. 

2. The Upper limit of a class and the lower limit of 
the next higher class are the same 


3. This method is suitable in each situation. 

4. It is not required to change the exclusive series 
into inclusive series for the calculation. 

Illustration 5. 


20 students appeard in an examination. The marks obtained out of 
50 maximum marks are as follows : 5, 16, 17, 17, 20, 21, 22, 22, 22, 


25, 25, 26, 26, 30, 31, 31, 34, 35, 42, 48. 
Classify the data by ‘Exclusive Method’ and by ‘Inclusive Method’ 
the width of the class-interval being 10. 


Solution : Following distribution will be done in 10, 10 class 
intervals (i = 10): 

Inclusive Method Exclusive Method 
Marks Tally No. of Marks Tally No. of 
Marks Students Marks Students 
1—10| 1 0—10 | 1 

11—20 |||| 4 10—20 ||| 3 

21—30 » |||| 9 Z2O—30 « |||] 9 

31—40 |||| 4 30—40 » 5 

41—50 || 2 40—50 || 2 

Total 20 Total 20 


Illustration 6. 


Convert cumulative frequencies into ordinary frequencies : 
Variable No. of Students Variable No. of Students 

(A) less than 10 5 (B) more than 0 36 

less than 20 17 more than 10 31 

less than 30 30 more than 20 19 

less than 40 35 more than 30 6 

less than 50 36 more than 40 1 


Solution : 

(A) Marks No. of Students ( c.f. ) Mark (Group) No. of Students ( f ) 
Less than 10 5 0-10 5 
Less than 20 17 10—20 
Less than 30 30 20-30 
Less than 40 35 30—40 
Less than 50 36 40-50 
Total 36 


17 —5) 12 
30 — 17) 13 
35 — 30) 5 
36 — 35) 1 


a i i 


The rule for finding the ordinary frequency from the cumulative 
frequency less than type is that the cumulative frequency of a class 
is decreased by the cumulative frequency of preceding class. 

(B) Marks No. of Students ( c.f. ) Mark (Group) No. of Students ( f ) 

More than 0 36 0-10 (36 — 31) 5 

More than 10 31 10-20 (31 — 19) 12 

More than 20 19 20-30 (19 — 6) 13 

More than 30 6 30-40 (6 — 1) 5 

More than 40 1 40—50 1 


Total 36 


The rule for finding the ordinary frequency from the cumulative 
frequency more than type is that the cumulative frequency of a class 
is decreased by the cumulative frequency of succecding class. 


Illustration 7. 

Re-arrange the following series with equal intervals and then 
prepare ‘less than type’ and ‘more than type’ cumulative frequency 
distributions : 

Classes 0-5 5-6 6-9 9-12 12-17 17-18 18-20 20-24 24-25 25-30 30-36 


Total 
Frequency 3 27 5 16 12 15 208 102 100 


Solution : 

Studying carefully, it comes to know that the largest width of the 
classes is 6. Hence the limits of the regular classes may be taken as 
O0O—6, 6—12, 12—18, 18—24, 24—30, 30—36. Whose respective 
frequencies will be 3 + 2, 7 + 5,16 + 12, 15 + 20, 8 + 10, 2. The 
required frequency distribution and cumulative frequency distribution 
are as following. 


Class Interal O—6 6—12 12—18 18—24 24—30 30—36 Total 


Frequency 5 12 28 35 18 2 100 


Less than type cumulative frequency More than type’ Cumulative 
Frequency 


Variable Numbers Variable Numbers 

Less than —6 5 More than 0 95 + 5 = 100 

Less than —-125+12=17 More than 6 83 + 12 = 95 
Less than — 18 17 + 28 = 45 More than 12 55 + 28 = 83 


Less than — 24 45 + 35 = 80 More than 18 20 + 35 = 55 
Less than — 30 80 + 18 = 98 More than 24 2 + 18 = 20 
Less than — 36 98 + 2 = 100 More than 30 2 


Illustration 8. 

Using Sturges’s rule n = 1 + 3.322 log N, where n is the number 
of classes, N is the total number of observation, classify, in equal 
class-intervals, the following data of hours worked by 50 workers for 


a period of a month in a certain factory : 


110 165 113 42 149 175 133 69 30 104 
161 195 121 62 187 157 151 93 138 184 
155 141 143 156 197 108 103 140 167 87 
164 150 144 124 40 128 162 71 164 122 
114 149 94 145 203 178 79 87 116 148 
Solution : 
No. of Class Intervals, n = 1 + 3.322 log N= 1 + 3.322 log 50 
= 1+ 3.322 x 1.699 = 1 + 5.644 = 6.644 or 7 approx. 
C.lori= os 7 7 7777 or 25 approx. 


Taking 7 classes with each of width 25, the above data will be 
classified in the following way and first class will start from the 
minimum value 30— 


Class-Intervals (Hours) Tally Mark No. of Workers (Frequency) 
30-55 ||| 3 

55-80 |||| 4 

80-105 » | 6 

105-130 « |||| 9 

130-155 ww || 12 


155-180 » » | 11 
180-205 «5 
Total 50 


BIVARIATE FREQUENCY DISTRIBUTION 


Sometimes we have to take two measurements on different items. 
Such type of data are known as ‘Bivariate data’. For example, the 
marks obtained by 50 students in two subjects statistics and 
mathematics, the frequency distribution prepared by these data is 
called ‘Bivariate frequency distribution.’ 

Illustration 9. 

The data given below relate to the heights and weights of 20 
persons. You are required to form a two-way frequency table with 
class-intervals 62"—64", 64—66" and so on and 115 to 125 lb, 125 to 
135 Ib and so on. 

S. No. Weight Height S. No. Weight Height 
1170 70 11 163 70 
2 135 65 12 139 67 
3 136 65 13 122 63 
4 137 64 14 134 68 
5 148 69 15 140 67 
6 124 63 16 132 69 
7 117 65 17 120 66 
8 128 70 18 148 68 
9 143 71 19 129 67 
10 129 62 20 152 67 


Solution : 
Tally-Sheet 
Y Height — 62—64 64—66 66—68 68—70 70—72 Total 
| X Weight 
115—125 || | | 4 


125—135 || || | 5 
135—145 ||| || | 6 


145—155 | || 3 
155—165 | 1 
165—175 | 1 
Total3 454420 
On writing number from tally marks two-way frequency distribution 
is given below : 
Two-way Frequency Distribution 


Height ( Y ) 62—64 64—66 66—68 68—70 70—/72 Total 
( fx ) 

| Weight ( xX ) 

Ib 

115—125211——4 


125—1351—1215 
135—145 —32—16 


145—155 — — 12—3 
155—165 — — — — 1 1 
165—175 — — — — 11 


Total ( fy) 3454420 
THEORETICAL QUESTIONS 


Long Answer Questions 

1. Discuss the considerations that should guide you in determing (i) the number 
of classes, (ii) the magnitude of class-intervals, and (iii) the class-limits. 

2. What are the general guidence with reference to the determination of number 
of classes and width of classes for the construction of frequency distributions ? 
Answer with examples. 


3. From the passage given below prepare a discrete frequency series : 

‘A tax’ said Professor Seligman, “is a compulsory contribution from a person to 
the government to defray the expenses incurred in the common interest of all, 
without reference to special benefits conferred.” 

[Ans. 

No. of Letters 1234567891011 12 


No. of Words 365203253201 


4. From the lines given below prepare a frequency table : 

“All, fill the cup what boots it To Repeat How time is slipping, under neath, out 
feet, unborn, tomorrow, and dead, yesterday why fret, about, them, today be 
sweet.” 


[Ans. No. of letters in words % 3, 4, 3, 3, 4, 5, 2, 2, 6, 3, 4, 2, 8, 5, 5, 3, 4, 6, 8, 
349). 33:4 524.9, 25 

x: 123456789 Total 

f£:047762021 29 

5. Present the following scores in the form of a frequency distribution : 

5).7,°1,4,..3; 5, 6,4, 7,2, 6,3, 4, 6,5, G 1, 7¢-3;2: 

[Ans. C.l. 1—3 3—5 5—7 7—8 

Frequency 4 6 6 4] 

6. Present the following marks in Statistics obtained by 20 students in the form of 
a grouped frequency distribution. Use both exclusive and inclusive class- 


intervals : 
1026111071418205 
17 1411322 121091316 


2-1 


ans : (if ta tam = 
Exclusive Class-intervals 0—5 5—10 10—15 15—20 20-—25 


Frequency 24932 


Inclusive Class-intervals 0—4 5—9 10—14 15—19 20—24 
Frequency 24932 


7. Prepare a frequency taking 4 as the width of the class-intervals and using 


inclusive method : 
10 17 15 22 11 16 19 24 29 18 25 26 32 14 17 20 23 27 30 12 
15 18 24 36 18 21 28 38 43 15 24 13 10 16 20 22 29 23 31 


[Ans. C.l. : 10—13 14—17 18—21 22—25 26—29 


Frequency 58785 
C.l. : 30—33 34—37 38—41 42—45 


Frequency 311 1 
8. In an examination the marks obtained by 20 students (out of maximum marks 
25) are as follows. Prepare a bivariate frequency distribution (or bivariate 


frequency table) (class 0-5, 5—10) 
Marks Marks 
R. No. Statistics Economics R. No. Statistics Economics 
15911134 
217161210 10 
366 13 15 13 
4011429 
5191915 11 14 
611716168 
7851713 11 
814111837 
9121993 
10181220417 


9. Prepare a bivariate frequency table for the marks obtained by 24 examinees in 


Statistics and Accountancy given below : (class O—5, 5—10) 
Marks Marks 
R. No. Statistics Accountancy R. No. Statistics Accountancy 
1 22 16 13 23 16 
2 23 16 14 2517 
3 23 18 15 23 17 
4 23 16 16 22 17 
5 23 16 17 27 15 
6 24 17 18 27 16 
7 23 16 19 26 18 
8 25 19 20 28 19 
9 22 16 21 25 19 
10 23 18 22 24 16 
11 24 18 23 2317 
12 2417 24 2419 


OBJECTIVE QUESTIONS 


Choose the correct answers : 


1. In an exclusive series : 
(a) both class limits are considered (b) the upper limit is excluded 
(c) both limits are excluded (d) the lower limit is excluded 


2. Difference of upper and lower limit of a class is called : 
(a) Class frequency (b) Magnitude of class-interval 
(c) Class limits (d) Mid-point 


3. In usual notation Sturges’ formula is : 
(a) N= 1+ 3.322 log N (b) n = 1-—3.322 log N 
(c) n= 1+ 3.322 log N (d) None of these 


4. Mid-value of 50—60 is : 
(a) 50 (b) 55 (c) 60 (d) 54 


5. If the class-intervals in a frequency distribution are 72—73.9, 74—75.9, 76— 


77.9, 78—79.9, then the mid-value of the class 74—75.9 is : 
(a) 74.50 (b) 74.90 (c) 74.95 (d) 75.00 


[ Ans. 1. (b), 2. (b), 3. (a), 4. (b), 5. (c).] 


Both limits of the class are included in that class. 


The Upper limit of a class and the lower limit of the next higher class 


are not the same 
but they are different. The difference is 
generally taken as 1. 


When observations are given in integers, it will be suitable. 
For the simplicity of the calculation, this method is transferred into 


exclusive method. 
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MEASURES OF CENTRAL 
TENDENCY 


INTRODUCTION AND DEFINITION 


We cannot obtain a general idea about the data by representing 
them through tabulation and frequency distribution. To this end use 
need some certain characteristics of the data which can represent 
the data as a whole. 

Such ‘characteristics are called measures of central tendency. 
They are also called averages. Some values are more than them 
and some values are less than them. 

According to Simpson and Kafka, “A measure of central tendency 
is a typical value around which other values aggregate.” 

According to Watkins, “Average is a representative figure which is 
‘gist’ out the substance of statistics.” 

It is clear from aforesaid definitions that an average is a simple 
and condensed value that represent all items values. Therefore, it is 
also known as measure of central tendency. 


FUNCTION OF AVERAGE 

The main function of statistical average are : 

(1) To present a brief picture : The average represents the brief 
picture of the series. It is not possible to care each item of a huge 
series but it is easy to care a single value in the form of its average. 
For example, data related to national income, population, industrial; 
production of India. 

(2) To facilitate comparison : Two or more series can be 
compared easily by presenting the data in the form of an average. 
For example, if it is to compare the income of people of India and 
America, it is not possible to compare the income of each person. 


But it become possible to compare the average income of both 
countries. 

(3) To represent the entire group : An average representes the 
characteristics of the whole series. The knowledge of characteristics 
of the entire group is not possible by the study of individual units. 
Also unstability remains there while average units have the tendency 
to remain stable. For example, per day sale of a general store— 
keeper remains unstable, but average monthly sale generally 
remains stable. 

(4) Basis of statistical analysis : Various measures of statistical 
analysis such as dispersion, correlation, regression etc. are 
interpreted on the basis of average. 

(5) Helpful in decision : The average plays on important role in 
various types of business decisions. For example, on an average, 
how many passengers travel on a railway track. It is decided to 
increase or decrease the number of trains on that track by this 
information. 

(6) Base of mathematical : If we have to represent two different 
series in mathematical shape, we take the help of average. If it is 
said that the income of people in America is more than the income of 
people in India, a clear cut knowledge is not obtained from it but if 
we present the data of income per person of both. The countries in 
the form of average, knowledge will be more clear and valid. 


CHARACTERISTICS OF AN IDEAL AVERAGE 


According to Prof. G. Udny Yule, the following characteristics 


should be satisfied by an ideal average : 
(1) It should be rigidly defined. 
(2) It should be easily understandable. 


(3) Its calculation should be based on all observations and should 
be easy. 

(4) It should be least affected by fluctuations of sampling. 

(5) It should be capable of algebraic treatment. 

(6) It should be least affected by extreme values. 


KINDS OF STATISTICAL AVERAGES 


The following are the important types of averages : 

(A) Mathematical Average 

(1) Arithmetic Average (2) Geometric Mean 

(3) Harmonic Mean (4) Quadratic Mean 

(B) Positional Average 

(1) Median (2) Mode 

(C) Commercial Average 

(1) Moving Average (2) Progressive Average 

(3) Composite Average 

In this chapter we shall describe only some important averages 
which are as follows : 


ARITHMETIC AVERAGE OR MEAN 


Arithmetic average is the most easy, suitable and popular average 
in all mathematical averages. In practice this average is widely used. 
In general meaning it is also called average. When the sum of all 
items is divided by the total number of items, this average is 
obtained. It may be defined as follows : 


“The arithmetic average is the value obtained by dividing the sum 
of the values of all items in a series by the number of items 
constituting the series.” It is denoted by simple ‘a’ or _ .. Arithmetic 


average is of two types : 
(1) Simple Mean or Simple Arithmetic Average 
(Il) Weighted Arithmetic Mean 


(1) Simple Arithmetic Mean : Giving the equal importance to all 
the items of the series when mean is calculated by dividing the sum 


of all items by the number of items, the mean, so obtained is called 

simple arithmetic mean. 

Methods of Calculating Simple Average or Mean 
(1) Calculation of simple mean in individual series : The value 

of each item in a series is independent. The items are only given in 

this series and they have no frequency. Simple arithmetic mean in 


individual series is calculated by following three methods : 
(1) Direct method (2) Short-cut method 
(3) Step-deviation method 


Direct Method : 

The calculation method of arithmetic mean in idividual series by 
direct method is as 
follows : 


(i) We find the sum of all items values ( LX ). 


(ii) We find then the number of items (N). 
(iii) Then we calculate the arithmetic mean by using the following 
formula : 


or a= 
Illustration 1. 
From the following age of 8 persons, calculate arithmetic mean : 
Age (in year) : 16 20 18 25 35 43 22 21 
Solution : 
S.No. Age (in years 
(xX) 
116 
2 20 
318 
425 = 
5 35 
643= =2/7 years 


7 22 
8 21 


N=82 X=216 
Short—cut Method : 

(i) In this method we take an assumed mean (A) from given items 
or different from items and find deviations of each item from this 
assumed mean, /.e. , 

dx=X-A 

(ii) Then we tind the sum of the deviations ( () dx ) and no. of 
items (N). 

(iii) We find X from the following formula : 

=A+ > 
Illustration 2. 


Calculate arithmetic mean by short-cut method : 
15 18 25 40 50 20 


Solution : 

S.No. Marks Deviation 

X dx ( A = 25) 

115-10 

218-7 Formula: =A+ + 
3250 

440+15=25+ 

5 50 + 25 


6 20 —5 = 25 + 3 = 28 Marks 
N=62 dx =+ 18 
Step-deviation Method : 

When X has more values and some common factor then this 
method is used for the simplicity in the calculation. Its calculation 
method is as follows : 

(i) We find dx according to previous method. 


(ii) We find dx’ by dividing dx by common factor ( / ). 

(iii) Then we calculate simple arithmetic mean by the following 
formula : 

=A + ae / 

Illustration 3. 

From the following data, calculate arithmetic mean by using step deviation 
method : 

20 40 30 80 200 50 240 70 


Solution : 
S.No. Marks Deviation Step deviation 
(A = 80) (i= 10) 


X dx dx' 

120-60-6 

240-40-4 
330-50-—5Formula: =At+ « xi 
48000 

5 200 + 120+ 12=80+: x 10 

6 50-30-3 


7 240 + 160 + 16 = 80 + 11.25 
8 70-—10-1 = 91.25 
N=8+9 
(2) Calculation of simple mean in descrete series : Mean can 
be calculated by three methods in discrete series like individual 


series. Their descriptions are as follows : 
Direct Method : 


(i) We represent item by X and frequency by f. 
(ii) We calculate the product of X and f for each item. 
(iii) We find the sum of the above products 2 fx and sum of 


frequencies (2 f ). 
(iv) We calculate mean by using the following formula : 


=f 
= » 


Illustration 4. 
From the following series, calculate mean : 


Value 56789 10 


Frequency 24912103 
Solution: 
Item ceeve Frequency Product 
(X)(f)(Fx x) 
5 2 10 
6 4 24 
7963 
8 12 96 
9 10 90 
10 3 30 


2 f=402 fk = 313 


af 


= y = 0 = 7.825 
Short-cut Method : 
(i) In this method we calcualte deviation of all items from a 
assumed mean ( dx = X -A ). 
(ii) We find the product of f and dx for each item. 
(iii) Finding 2 fdx and 2 f we calculate mean by the following 
formula : 
=A+y 
2 f = N. Hence this formula can be written as follows : 
a At a 
Illustration 5. 
Calculate mean from the following data : 


Marks 10 20 30 40 50 
No. of Students 8 10 20 157 


Solution : 
X f dx ( A = 30) fdx 


10 8— 20-160 
2010-10-100 =A+ 
30 2000 


40 15 10 150 = 30 + 
50 7 20 140 = 30.5 Marks 
2 f =60 2 fdx = 30 
Step-deviation Method : 

(i) In this method we find dx like short-cut method. 

(ii) We find dx ' ds by dividing dx of each item by greatest 
common number ( / ): 

dx 'or ds = 

(iii) Multiplying each dx’ by corresponding frequency ( f ) we find 
fdx'. 

(iv) We find 2 fdx ' and 2 f or N and calculate by using the 
following formula : 

=A+t YX / 

Illustration 6. 

From the following frequency distribution, calculate mean : 

Marks 15 30 45 60 75 90 

Students 8 12 18 30 12 10 


Solution : 
xX f dx dx ' fdx' 
(A =45)(/= 15) 
158-30-2-16 
3012-—15-1-12 =At x | 
4518000 


60 30+15+130=45+ x15 
75 12 + 30+ 2 24 


90 10 + 45 + 3 30 = 54.33 marks 

2 f=902 fdx' 

= + 56 

(3) Calculation of arithmetic mean in continuous series : Items 
are given in groups (classes). We find mid-value of each group to 
calculate mean. Fir it we divide the sum of both limits (L 4 and L 9 ) 


by two: 

Mid Value = 

T he continuous series changes into discrete series after finding 
the mid-value. Here we calculate in the same way as we calculate 


it in the previous method. The formulae are as follows : 
Direct Method 


= 


Short-cut Method 


D/dx 


= A + Y 
Step-deviation Method 
=At+ x i 


We can also use N in the place of 2 f in all above formulae. 


Illustration 7. 
From the following data calculate mean : 


Age 5-7 8-10 11-13 14-16 17-19 
No. of students 7 12 19 10 2 


Solution : 

Above series is inclusive series but there is no need to convert it 
into exclusive series to find mean. So this question can be solved by 
direct, short cut method and step deviation method in the following 
way : 

Direct Method : 


No. of Mid value 

Age Students m.v. fx 

f(x) 

5-77642 = 

8-10 12 9 108 
11-13 19 12 228 = 

14-16 10 15 150 = 11.28 
17-19 2 18 36 

2 f=502 fx = 564 

Short-Cut Method : 

Age m.v. f A = 12 fdx 

( x ) dx 

5-76 7-6-42 
8-10 9 12-3 - 36 
11-13 12 1900 
14-16 15 10 3 30 

17-19 182612 

2 f=502 fdx =- 36 

=At 

12+ @ =12-0.72 = 11.28 
Step-deviation Method : 

Age m.v. f dx dx' fdx' 

x (A= 12) (i= 3) 
5-767-6-2-14 =At» xi 
8-109 12-3-1-12 
11-131219000=12+ = x3 
14-16 151031 10 

17-19 182624 =12-0.24 x 3 


2 f=502 fdx'=-—12= 11.28 


(Il) Weighted Arithmetic Mean : Equal importance are given to 
all items in simple arithmetic mean, but in practice it does not 
happen so relative importance of a item may be less or more. Hence 
to find arithmetic mean by giving weights according to the 
importance of items is weighted arithmetic mean. When units are not 
homogeneous and not of equal importance, this mean is used. 

Methods of calculating weighted Arithmetic Mean : Camputing 
method of this mean is the same as the method of computing simple 
mean. 

There is difference only for simbols of weight and frequency. The 
weight ( w ) is used in it in the place of frequency ( f ). Hence its 
formula will be : 

Direct Method: = = 

Short-cut Method: =A+ = 

Step deviation Method: =A+ = xj 
Merits, Demerits and Uses of Arithmetic Mean 

Merits : 

(1) Arithmetic mean is rigidly defined. 

(2) It is based on all observations. 

(3) Its calculation is very easy. 

(4) It is least affected by the fluctuations of sampling. 

(5) It is capable of algebraic treatment. 

(6) Sum of deviations of items about arithmetic mean is zero. 

Demerits : 

(1) Its value cannot be determined by inspection only or by graph. 

(2) It is affected very much by extreme values. It gives greater 
importance to bigger items. A millionaire would greatly affect the 
average income of a town where the majority consists of ill paid 
persons. 

(3) It is possible that arithmetic mean may not equal to a value in 


the real data. For example, the arithmetic mean of 400, 500 and 700 
is 1600/3 which is not in the given data. 


(4) It cannot be used in qualitative studies. 

(5) We can draw fallacious conclusions when the actual figures 
are not given. For example, the marks of a student in B.Sc. |, Il, Ill 
are 50%, 60%, 70%, then his average percentage of marks is 60%. 

The marks of another student in B.Sc. |, Il, Ill are 70%, 60%, 50%, 
then his average percentage marks is 60%, but first is improving 
while there is downfall in the marks of second student. 

Uses : 

(1) It is used to study economic, industrial and commerce etc. For 
example, it is used to find average income, average production, 
average weight, average price etc. 

(2) It is used to calculate other measures such as coefficient of 
variation, correlation coefficient etc. 

(3) It has its own usefulness in comparative study. 


MISCELLANEOUS EXAMPLES 


Illustration 8. 
The marks obtained by 10 students in business statistics are given below : 
17, 26, 38, 20, 35, 22, 20, 40, 18, 24. 
Calculate arithmetic mean. 


Solution : 
X=17+26+38+20+35+22+20+40+ 18+ 24 = 260 
. Arithmeticmean = = =26 


Illustration 9. 
Calculate mean for the following data : 


Height (In cm) 65 66 67 68 69 70 71 72 73 


No. of Plants 14571110642 
Solution : 


Let A = 69 


X fdx = X—A fax 
651-4-4 
664-3-12 
675-2-10 


687-1-7 
69 1100 
70 10 1 10 
716212 
724312 
73248 


Total 50 9 
Arithmetic Mean, =A+ 
= 69 + » =69+0.18 = 69.18 
Illustration 10. 
Find mean of the following distribution : 
Class 0—7 7—14 14—21 21—28 28—35 35—42 42-49 
Frequency 19 25 36 72 51 43 28 


Solution : 

A=24.5,i=7 
Mid value 
Class f X A = 24.5 fdx 
0O—7 19 3.5- 21 — 399 
7—14 25 10.5 — 14 — 350 
14—21 36 17.5-—7 — 252 
21—28 72 24.500 
28—35 51 31.5 7 357 
35—42 43 38.5 14 602 
42—49 28 45.5 21 588 


Total 274 546 
Arithmetic Mean *-4* 
750 
274 
X = 24.5 + 1.99 = 26.49 


X = 24.5 + 


Illustration 11. 


Find mean marks of students from the following table : 


Marks No. of students Marks No. of students 
More than 0 80 More than 60 28 
More than 10 77 More than 70 16 
More than 20 72 More than 80 10 
More than 30 65 More than 90 8 
More than 40 55 More than 100 0 
More than 50 43 
Solution : 


Class Frequency Mid value A = 55 fdx 


fx 

O0O—10 80-77 = 3 5—50 — 150 
10—20 77-72 = 5 15 — 40 — 200 
20—30 72-65 = 7 25 — 30 — 210 
30—40 65-55 = 10 35 — 20 — 200 
40—50 55—43 = 12 45 — 10 — 120 
S0—60 43-28 = 155500 
60—70 28-16 = 12 65 10 120 
70—80 16-10 = 6 75 20 120 
80—90 10-8 = 2 85 30 60 
90—100 8—0 = 8 95 40 320 


Total 80 2 fdx =- 260 


Yfdx 
Y= A+ 
N 
_. —260 
x ee Te ea 7 
SO 


A = 55 - 3.25 = 51.75 


Illustration 12. 

Find the missing frequency from the following data, it is being given that 19.9 is 
the average number of tablets for being cured : 
No. of tablets No. of person cured 


4—8 11 
8—12 13 
12—16 16 
16—20 14 
20—24 ? 
24—28 9 
28—32 17 
32—36 6 
36—40 4 


Solution : 
Let the missing frequency be V. 
Class Interval Frequency Mid value dx fdx 
(f)xX 
4—8 116-—16-—176 
8—12 13 10 — 12 — 156 


12—16 16 14-8- 128 
16—20 14 18-4-56 
20—24 V 22=A00 
24—28 9 26 4 36 
28—32 17 30 8 136 
32—36 6 34 12 72 
36—40 4 38 16 64 


Total 90 + V——- 208 


_ Ddhdx 
CS 4a 


=> 19.9=22+ wiv 
=> 19.9-22= wiv 


> 1 =— wiv 

= -— 189-2.1 V =-— 208 
= 2.1 V=-—208 + 189 
>=>-2.1V=-19 


-19 


>V= 2=9.04 


Illustration 13. 

Find out arithmetic mean from the following table : 
Age Groups Frequency 
80-89 2 
70-79 2 
60-69 6 
50-59 20 
40-49 56 
30-39 40 
20-29 42 
10-19 32 


Solution : 


Above Question is in inclusive series. We shall convert it into 
exclusive series. 


Class Frequency Mid value dx =~ fdx 


(P)A% } 

79.5-89.5 2 84.548 
69.5-79.5 2 74.536 
59.5-69.5 6 64.5 2 12 
49.5-59.5 20 54.5 1 20 
39.5-49.5 56 44.5 0 0 
29.5—-39.5 40 34.5 — 1 — 40 
19.5-29.5 42 24.5 -— 2-84 
9.5-19.5 32 14.5-3- 96 


Total 200 —-——-— 174 
ixfdx 


Arithmetic Mean = X=A+ vw , 
where A = 44.5, 7 = 10 

=44.5 + “20 

= 44.5 —- 8.7 = 35.8 Years 


Illustration 14. 
Find out arithmetic mean from the following table : 


Marks (Less than) No. of students 


10 15 
20 35 
30 60 
40 84 
50 96 
60 127 
70 198 
80 250 


Solution : 


Class Frequency Mid value dx = “fx 
(f)(x) 

0-10 155-—4-60 

10-20 35-15 = 20 15-3 -—60 

20-30 60-35 = 25 25 — 2 — 50 

30-40 84-60 = 24 35-11-24 

40-50 96-84 = 12 4500 

50-60 127-96 = 31 551 31 

60-70 198-127 = 71 65 2 142 

70-80 250-198 = 52 75 3 156 


Total 250 — — 135 
AM.= =A+“* ,where A = assumed mean 
= 45 
i=10 
=45+ %» =45+54= 50.4 


Illustration 15. 
Find out A.M. from the following table : 


Marks No. of students 
Less than 10 4 

10-20 6 

20-30 10 

30-40 15 

40-50 8 

More than 50 7 


Solution : 

Since the table has open end in the above Question, we shall 
have to decide the lower limit of first class and upper limit of last 
Class. 

Since here width of class is 10. Therefore, the first class may be 
O-—10 and the last class may by 50-60. 


Class Frequency Mid value dx =: fdx 
(f)(x) 

0-1045-2-8 

10-206 15-1-6 

20-30 10 2500 

30-40 15 35115 

40—50 8 45 2 16 

50-60 7 55 3 21 


Total 50 — — 38 
AM.=*% =A+ % =25+ ms =25+7.6= 32.6 


Illustration 16. 
Find out missing frequency from the following table : 


Step deviation —-2-1012 
Frequency 31 58 60 ? 27 
A.M = 47.2 
Assumed mean = 47.5 
Width of C.l. (i) =3 
Solution : 
Let missing frequency be a. 


Step deviation Frequency fdx 
dx = a4 f 

—2 31-62 

— 158-58 

0600 

1aa 


221 54 
Total 176+ a-—66+a 


ixfdx 


AM.=A+ °% 
=> 47.2 =47.5 4+ 


—198 + 38a 


=> 47.2 —47.5 = Wee 


—198 + 3a 


— 0.3 = li6+a 
=> -52.8-0.3a=-198+3a 
>-3a-03a=-198 + 52.8 
>-3.3a=- 145.2 
a= oa = 44 
Hence the missing frequency is 44 
Illustration 17. 
Mean of the distribution is 25. Find out the missing Frequency : 
Class Interval Frequency 
0-103 
10-207 
20 — 30 20 
30 —- 406 
40-50? 
50 — 60 1 
Solution : 


Class Interval Frequency ( f ) Mid value ( X ) (fx) 


0-103515 

10 - 20715105 

20 — 30 20 25 500 

30 — 40 6 35 210 

40-50 a4545a 

50 —60 1 55 55 

Total N = 37 + a— 885+45a 


>y/x 


A.M., ¥ = 3 


25(37+a)=885+45a 
925+ 25 a =885+45 a 
45 a-25 a = 925-885 
20 a= 40 


40 
a=—= 
20 


2 


Illustration 18. 

Calculate mean from the following frequency distribution : 
Marks 9-11 11-13 13-15 15-17 17-19 19-21 
No. of students 3 712 18105 

Solution : 
Class Frequency Mid value dx = *) fdx 
(f)(x) 
9-11310-—3-9 
11-13 7 12-—2- 14 
13-15 12 14-11-12 
15-17 18 16 00 


17-19 10 18 1 10 
19-21 5 20 2 10 


Total 55 — 15 
( )=AtS =16+%s" = 16-5 
= 16 — 0.55 = 15.45 


Illustration 19. 
Find out A.M. from the following table : 


Size of item 10-15 15-17.5 17.5—20 20-30 30-35 35-40 Above 40 
Frequency 10 15 17 25 28 30 40 


Solution : 


Here class interval is not same. Hence we need to reconstruct the 
class to find A.M. The width of the largest class is 30-20 = 10. 


Taking the width of classes as 10 we shall reconstruct the classes 
and adjust the frequencies. 


Class Frequency Mid value dx = °« fdx 


(7) (x) 

10-20 10+ 15+ 17 = 4215-1 -42 
20-30 25 2500 

30-40 28 + 30 = 58 35 + 158 
40-50 40 45 2 80 


Total 165 — — 96 
A.M. = A+ 0 
= 25 + <a 
= 25+ 5.82 
= 30.82 
Illustration 20. 
The arithmetic mean height of 50 students of a college is 68 inches. The height 
of 30 of these is given in the frequency distribution below. Find the arithmetic mean 
height of the remaining 20 students : 


Height in inches 64 66 68 70 72 


Frequency 412482 
Solution: 


Height inches Frequency ( fX ) 


(X)(F) 
64 4 256 
66 12 792 
68 4 272 
70 8 560 
722144 
Total N = 30 = fX = 2024 


A.M. of students = 68 inches 
Hence Total height of 50 student = 68 x 50 = 3400 inches 


Subtract : Total height of 30 student (from table) = 2024 inches 


Hence height of 20 student = 3400 — 2024 = 1376 inches 

Average height of 20 students = » = 68.8 inches 
Illustration 21. 

The following are the monthly salaries of 30 employee of Balaji Ltd. : 

139 126 114 100 88 62 77 91 103 108 

129 144 148 63 69 148 132 118 142 116 

123 104 95 80 85 106 123 133 140 134 

The firm gave bonuses of ~ 10, 15, 20, 25, 30 and 35 for the individuals in the 
respective salary group : exceeding 60 but not exceeding 75, exceeding 75 but not 
exceeding 90 and so on upto exceeding 135 but not exceeding 150, find the 
average bonus paid per employee. 
Solution : 

Calculation of average bonus : 
Alternative Tally Frequency Bonus Total Bonus 
Class intervals method of writing Tally mark ( f ) ( X ) ( fX ) 


class intervals * ~ 
Exceeding 60 but 


not exceeding 75 61-75 ||| 3 10 30 
"75" 90 76-90 || || 4 15 60 
"90" 105 91-105 « 5 20 100 
"405" 120 106-120 » 5 25 125 
"420" 135 121-135 » || 7 30 210 
"435" 150 136-150 «| 6 35 210 
N = 30 735 

= = * 24.50 

Exercise 7 (A) 


1. Marks obtained by 9 students in applied statistics are given below : 
52, 75, 40, 70, 43, 40, 65, 35, 48. 
Calculate arithmetic mean. 


[ Ans. : A.M. = 52] 
2. Calculate arithmetic mean from the following table : 


No. of Childrens 123456 
No. of Family 100 125 200 175 140 75 


[ Ans. : 3.44] 
3. Calculate arithmetic mean from the following table : 


Marks 0—10 10—20 20—30 30—40 40—50 50—60 
No. of Students 5 10 25 30 20 10 


[ Ans. : A.M. = 33] 
4. The height of 165 American males one noted below. Find the average height : 


Height in inches 45—50 50-55 55-60 60-65 65-70 70-75 75-80 


No. of Males 2 10 21 55 40 325 


[ Ans.: = 64.68 inches] 
5. The following table gives the male population of city. Find out the average age 


of the whole male population : 
Age group (years) 0-5 5—10 10-15 15-20 20-30 30-40 40—50 50-60 60-70 


Male population (in '000)98871512964 


[ Ans.: = 26.92 years] 
[ Hint : There is no need from the series. Solve by finding M.V.] 


6. Find out arithmetic mean from the following distribution : 
Group 0-11 11-22 22-33 33-44 44-55 55-66 
Frequency 9 17 28 26 15 8 


[ Ans. : A.M. = 32.3] 
7. Calculate the average profits for all the companies from the following figures of 


profit earned by 1,400 companies during 2010-11 : 


Profit ( ~ in lakh) No. of companies 
200 — 400 500 


400 — 600 300 
600 — 800 280 
800 — 1000 120 
1000 — 1200 100 
1200 — 1400 80 
1400 — 1600 20 


[ Ans. : 605.71 lakh] 
8. Calculate arithmetic mean from the following frequency table : 


Weekly rent (in © ) No. of rent payees persons 
200 — 400 60 

400 — 600 90 

600 — 800 110 

800 — 1000 140 

1000 — 1200 200 

1200 — 1400 150 

1400 — 1600 100 

1600 — 1800 80 

1800 — 2000 70 


[ Ans. : 1,100] 
9. From the following frequency table calculate arithmetic mean : 


Group Frequency 
2.1-2.7 2 
2.7-3.36 
3.3 -3.9 7 
3.9-4.55 
4.5-5.13 
51-627 2 


[Ans.: =3.77] 


10. The weights (in grams) of articles are given below : 

14, 16, 16, 14, 22, 13, 15, 24, 12, 17, 23, 14, 20, 17, 21, 18, 18, 19, 20, 16, 15, 11, 
12, 21, 20, 17, 18, 19, 22, 23. 

Form a grouped frequency table by dividing the variate range into intervals of 
equal width, one class being 11-13 and then compute the arithmetic mean. 


[ Ans.: =1/7.7 gram] 


[ Hint : Convert the given series 11-13, 14-16, 17-19, .... into continuous series 
and, then solve. ] 
11. Following is distribution of persons in Great Britain under different income 


groups : 
Income No. of Persons Income No. of Persons 
(in ‘000 pounds) ('000) (in ‘000 pounds) ('000) 
O0O—1 13 10—25 27 
1—2 90 25—50 6 
2—3 81 50—100 2 
3—5 117 100—1000 2 
5—10 66 
Obtain average income per person. 


[ Ans. : A.M. = 8,056 pounds] 

12. Find arithmetic mean from the following : 
Marks below 10 20 30 40 50 60 70 80 
No. of students 15 35 60 84 96 127 198 250 


[ Ans.: = 50.4 marks] 


13. Determine the arithmetic mean in the following series : 
Height (cm) below 10 20 30 40 50 
Frequency 15 35 60 84 96 


[Ans.: = 24.792] 


14. From the following data calculate arithmetic mean : 
Marks No. of students 

Less than 5 7 

Less than 10 20 

5-15 38 

15 and above 55 

20-25 20 

25 and above 5 

30 and above 1 


[ Ans. : 15.45 marks] 


15. Calculate mean from the following : 
Income between 100-200 100-300 100-400 100-500 100-600 
No. of persons 15 33 63 83 100 
[ Ans.: = 356] 
16. Number of workers in 1,000 cloth units are given in the following table. Find 


average size of units : 


No. of workers No. of units 
10—20 10 

10—30 40 

10—40 110 

10—50 260 

10—60 560 

10—70 770 

10—80 920 

10—90 980 

10—100 1000 


[ Ans. : A.M. = 58.5] 
[ Hint : Class 10—20 20—30 30—40 .... Other 
Frequency 10 40 — 10 = 30 110 — 40 = 70] 


17. In a class the number of books issued to students from the library were as 
follows : 


No. of students Books issued 
70 

12 1 

10 2 

73 

64 

45 

36 

17 


Find the average no. of books issued to a student of the class. 


[ Ans.: = 2.44 books] 


[ Hint : Issued books are x and number of students is f. ] 
18. The working results in 2003 of 50 branches of a concern given below. Find the 


average profit per branch : 


Loss Profit No. of branches 
Loss 

2000-3000 3 

1000-2000 5 

0-—1000 6 

Profit 

0-—1000 12 

1000-2000 16 

2000-3000 8 


[ Ans.: =~ 640] 

19. Calculate the arithmetic mean from the data : 
Intervals 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 
Frequencies 246107631 

[ Ans.: = 18.54 approx.] 


[ Hint : It is not essential to convert the inclusive series into exclusive series for 


arithmetic mean.] 
20. Find out the mean from the following, if the step deviations, /.e., dx have been 
taken from the assumed mean 25 and the magnitude of the class intervals is 5 : 
dx —-3-2-101234 
£614 223516942 
[ Ans.: = 24.35] 
[ Hint : Magnitude of the class interval / = 5] 
21. Find the missing frequency from the following frequency distribution when 
arithmetic mean is 22: 
Sales (in lac. © ) 1-10 11-20 21-30 31-40 41-50 
Frequency 57632 


[ Ans. : 4] 
22. From the following data find the missing figure : 
Salary (in © ) 110 112 113 117 120 125 128 130 Total 
No. of labourers 25 ? 13 ? 1486 2 100 
Average salary being ~ 115.86. 
[ Ans.: f 4 =17, f 9 = 15] 
23. Arithmetic mean of the following frequency distribution is 1.46. Find f 4 and f 
2: 
No. of accident 0 1 2 3 4 5 Total 
No. of days 46 f 4 f 5 25105 200 


[ Ans.: f 4 = 76, f 5 = 38] 

24. Find missing value when mean is 68.25 : 
x 50 58 60 65 70 ? 80 100 
f 220535810164 

[ Ans.: 75] 


25. The arithmetic mean of the following series is ~ 46.06. Find the missing weight 


Weight 20 30 ? 50 60 70 
Frequency 2410953 
[ Ans.: x , =40] 


MEDIAN 


If an series is arranged in ascending or descending order the 
value of the middle item of that series is called ‘Median’. It is denoted 
by ‘M’. According to A.L. Bowley. “If the number of the group are 
ranked in order according to the measurement under consideration 
then the measurement of the number most nearly one- half is the 
median.” 


According to Professor Connor, “Median is the value of that item in 
a series which divides into two equal parts, one part consisting of all 
values greater and the other all values less than.” 

Hence median is the middle term in a series, /.e., it is a measure 
of central position in the series. 


Computation Procedure of Median 
(a) Calculation of Median in Individual Series 

There is the following procedure to calculate median in individual 
series : 

(i) First of all series is arranged in ascending or descending order. 

(ii) Then all items in series are assigned by simple numbers 1, 2, 
ee 

(iii) After it, the following formula is used : 

m=size of = thitem 

Here, m = Median, 

N = No. of items 


N 


If the number of item is odd, then = will be a whole number and 
the number, so obtained, will be simply serial number of that series. 
The item against this number will be median. 

If the number of item is an even, then ~ will not be a whole 
number it will be a decimal. For example, if the number of item is 8, 


then = = 4.5. In such a case there are two middle items 4th and 5th. 
The mean of there two items will be the required mediam. 


Illustration 1. 
Calculate median from the following data : 
10, 28, 32, 300, 280, 1,200, 15, 30, 800 
Solution : 
Arranging the items in ascending order and giving serial numbers : 
Serial Number Item 
110 M=size of > thitem 
215 


3 28=sizeof = thitem 
4 30 
5 — 32 = size 5th item 


6 280 So M = 32 
7 300 

8 800 

9 1,200 


Illustration 2. 
Calculate median from the following data : 
5, 12, 6, 8, 7, 10, 14, 13. 


Solution : 


Serial Number Item m = size of » thitem 
15 

26=sizeof » thitem 

OF 

4 — 8 =size of 4.5th item 


5 —> 10 
6 12 
713 
8 14 


Size of 4.5 item = 

= =9 

Hence the value of median will be 9. 

(b) Calculation of Median in Descrete Series 

Items are given with their frequencies in a discrete series. In that 
series median calculated as follows : 

(i) First we arrange the items in ascending or descending order. 
Here it is necessary to note the point that if the place of an item is 
changed for it then its frequency will remain the same as it was 
before. For example, the items and their frequencies in a series are 
as follows : 

Item Frequency 


size of 4th item + size of 5th item 
2 


After the arrangement in ascending order the series will be as : 
Item Frequency 


(ii) Find cumulative frequency of the item for it, cumulative 
frequency of the first item will be the same as frequency of first item. 
Cumulative frequency of second item = cumulative frequency of first 
item + frequency of seocnd item, cumulative frequency of third item 
= cumulative frequency of second item + frequency of third item, .... 
calculating in this way we shall find the cumulative frequency of last 


item. Cumulative frequency of last item will be equal to (] for N . 
(iii) Then we use the following formula : 


m= > Here, N = f= total of frequencies 


(iv) After it, we see that the above calculated m falls in which 
cumulative frequency for the first time. The item against the 
cumulative frequency in which m falls for the first time is Median (M). 


Illustration 3. 
Calculate median from the following data : 


Measurement 5 10 15 20 25 30 


Frequency 2691274 


Solution: 
Item Frequency Cumulative Frequency 


(x)(f) (cf ) 


522 
1062+6=8 
1598+9=17 


20 — 1217+ 12=29 


25 729+ 7 = 36 
30 4 36 + 4 = 40 
> f=N=40 
m= > 


=> =20.5 
20.5 comes first in c.f. 29 hence item against it, /.e., 20 will be 


median. 


(C) Calculation of Median in Continuous Series 

In continuous series, the items are given in groups (classes). So to 
find median we have to calculate two things : first - median class 
and second — median. The calculation procedure will be as follows : 


(1) To find Median Class : 
(i) If the given series is inclusive then we shall change it into 
exclusive series. 


(ii) After it we shall calculate cumulative frequency according to 

previous method and calculate m by the following formula : 
m = size of = thitem 

(iii) The class against the cumulative frequency in which m comes 
for the first time is median class. 

(2) To find the Values of Median : After finding the median group 
(or class) by above method, the following formula will be used to 
calculate median : 

M=L4+ 7 (m-C) 

or M=L 4 +7(m-C) 

or Mah 4+ or xi 

where, L 4 = lower limit of median class 

L 5 = upper limit of median class 


i = width of median class 


f = frequency of median class 
C = cumulative frequency of the class just before the median class 
m= 
Illustration 4. 
Find out median from the following data : 
Income 20-30 30—40 40-50 50-60 60—70 70-80 80-90 
No. of persons 69 167 207 65 58 24 10 
Solution : 
Calculation of median group 
Income No. of persons Comulative frequency 
(x)(F)( cf ) 
20-30 69 69 
30—40 167 236 (c ) 
(L 4 )40-50(L 9 ) 207 (f) 443 
50-60 65 508 


60—70 58 566 
70-80 24 590 


80-90 10 600 


300 comes first in c.f. 443 hence the median group will be 40—50. 
Now using the following formula to calculate median : 

M=L4+ aa tie 

-. Median group 40-50. Hence L 4 = 40, L 9 = 50, f = 207 and c 
= 236 

Substituting the values in the formula, 

M=40+ ‘»: (300 — 236) 


= 40 + a7 x 64 = 43.09 
Advantages, Disadvantages and Uses of Median 


Advantages : 

(1) It is rigidly defined. 

(2) Its calculation is very easy. 

(3) It is not affected by extreme values. 
( 


4) It is readily and easily obtained without measuring all the items 
to be observed if items can be arranged in order of magnitude. To 
find the median height of 151 students, it is not necessary to 
measure the heights of all of them, they may be asked to stand in 
order of increasing height and thus the height of the middle student, 


i.e., of 76th will give the median height. 

(5) It is determined by graph also. 

(6) It gives good results in a study of qualitative measurement 
such as intelligence, honesty. 

(7) It can be calculated for the distributions with open end classes. 

Disadvantages : 

(1) Median is not based on all the observations of a series. 

(2) Its algebraic treatment is not possible. 

(3) It requires the data to be arranged in ascending or descending 
order. 

(4) In the case of even number of items it does not represent the 
real data. 

Uses : 

(1) When numerical values of variate is not possible, we use it, 
such as intelligence, colour, qualification etc. 

(2) If extreme values are very big or small, then median is the 
most suitable measure. 

(3) When ‘the items of both ends in a series are not known then 
we use it. 


To Change the Inclusive Series into the Exclusive 


Series 


When inclusive series is given then we calculate median by 
changing it into exclusive series. For it we use two methods : 


(i) To take upper limit of a class equal to the lower limit of the 
next class : In this method we give exclusive form to given series by 
taking upper limit of each class equal to the lower limit of next class. 
For example, if the given series is 0-4, 5-9, 10-14, 15—19. Then 
we change it into 0-5, 5-10, 10-15, 15-20. We use this method only 
when the difference between upper limit of preceding class and 
lower limit of next class is 1. In usual practice we do not use this 
method. 

(ii) With half of the difference between upper limit of each 
class and lower limit of next class : This method is mostly used in 
practice. In this method we find the difference of upper limit a class 
and lower limit of the class next to, .e., we subtract half of this 
difference from lower limit of each class and add to upper limit of 
each class. This inclusive series changes into exclusive series. We 
can understand it with the following example : 


Illustration 5. 
From the following data, calculate the value of median : 


Marks 1—5 6—10 11-15 16-20 21-25 


No. of students 3 6 14 125 
Solution : 
Converting into exclusive series xX f c.f. 


1-5 (1-0.5)-—(5 + 0.5) =0.5-5.53 3 

6-10 (6 — 0.5) —(10 + 0.5) =5.5-—10.569 
11-15 (11 — 0.5) — (15 + 0.5) = 10.5 — 15.5 14 23 
16-20 (16 — 0.5) — (20 + 0.5) = 15.5 — 20.5 12 35 
21-25 (21 — 0.5) — (25 + 0.5) = 20.5 — 25.5 5 40 
2 f=N=40 

To find median group : 


N40 


m= 2-2 =20 


20 comes c.f. 23 hence the class against it, .e., 10.5 — 15.5 will 
be median. 
Therefore, L 4 = 10.5, L 9 = 15.5, f= 14, c=9 
Calculation of median : 


1 


Subtituting the above values in the formula 


15.5 — 10.5 


M=10.5+ ~~ (20-9) 

=10.5+ u x 11=14.43 

Hence the value of median will be 14.43. 
Calculation of Median when Mid-value are Given 

When mid-value of items are given in Question, then the series 
seems like discrete series. Hence it is necessary to change it into 
continuous series. For it we find the difference of two continuous 
items and then we find lower limit of class by subtracting half of this 
difference from each item and upper limit by adding half of this 
difference to each item. It can be understood clearly from the 
following example : 


Illustration 6. 
From the following data calculate the value of median : 
Mid-value 10 20 30 40 50 60 


Frequency 2612721 

Solution: 
Mid value Conversion into = x 
( m.v .) continuous series f c.f. 
10 (10 -—5)-(10 + 5) =5-1522 
20 (20 —5)-(20 + 5) = 15-2568 
30 (30 — 5) — (30 + 5) = 25 — 35 12 20 
40 (40 — 5) —- (40 + 5) = 35-45 7 27 
50 (50 — 5) — (50 + 5) = 45 — 55 2 29 
60 (60 — 5) —- (60 + 5) = 55-65 1 30 
N = 30 


N _ 30 


m=. > =15 


15 comes first time in c.f 20 hence the class against it .e. , 25 — 
35 will be median class. 


Hence L 4 =25,L 9 = 35, f=12, m=15 and c =8 
Meta (m-—c) 


=25+ » (15-8)=25+ » x 7= 30.83 
Hence the value of median will be 30.83. 
Illustration 7. 


Find out missing frequency X for the following frequency distribution if median 
value is 86. 


Class Intervals Frequency 
40-50 2 
50-60 1 
60-70 6 
70-80 6 


80-90 X 
90-100 12 
100-110 5 


(Agra 2001) 
Solution: 


Class Interval Frequency c.f. 
40-50 2 2 

50-60 1 3 

60-70 69 

70-80 6 15 


80-90 X 15+ X 
90-100 12 27 + Xx 
100-110 5 32 + X 
Median = 86 
-. 80—90 will be median class. 


(Ye) 
Now Median=L 4+ 7 


>6xX=10+5xX 

>6xX-5xX=10 

« X= 10 
Illustration 8. 


Number of families according to size of land are given below. Find out the 
median values of land : 


Size of land (In hectare) No. of families 
0-1 550 

1-3 600 

3-5 400 

5-10 250 

10-20 110 

20-25 85 


50 and above 5 


Total 2000 
Solution: 

Here, class intervals are not the same, but their equality is not 
essential in the calculation of median. 


Class Frequency c.f. 
O—1 550 550 

1—3 600 1150 

3—5 400 1550 

5-10 250 1800 

10-20 110 1910 
20-50 85 1995 

50 and above 5 2000 


Total 2000 — 
7 = 3" = 1000 


c.f. just greater the 1000 is 1150. Hence class 1—3 will be median 
class and /=3-1 = 2. 


(3-4 


Median=L4+ 7 
=1+ 
=1+ =1+ =14+1.5=2.5 


Illustration 9. 
Find out the missing frequencies from the following table : 


Expenditure (in \ ) 0-20 20-40 40-60 60-80 80-100 Total = 100 
No. of families 14 ? 27 ? 15 Median = 50 
Solution : 
Let f 4 be the frequency of the class 20-40 and f 9 be the 
frequency of the class 60—80. Median is 50 which lies between 40— 
50. Hence the class 40-60 will be median class. 


Class Frequency c.f. 
0-20 14 14 


20-40 fF 4 14+f 
40-60 2741+ f4 
60-80 fo 41+f4+fo 

80-100 1556+ f4+f5=N=100 


N c) i 


. Median = L 4 + ar 
> 50=40+ “2” 
= 50-40 = 

> 10= 

=> 270 = (36-—f 4 )20 
=> 27 =(36-f4)2 
=> 27=72-2f,4 


> 2f4=72-27=45 
.f 4 =22.5~ 23 
N=100=56+f4+f9 
> 100-56=f4,+fo 


>44=f4+fo 

> 44=23+fo[ f4=23] 
> 44-23=f5 
nf =21 


Illustration 10. 

The description of obtained marks in statistics of 50 students is given below. 
Find out median marks. If 60 percent students have passed the test, then find out, 
what will be the minimum marks to pass the 
test : 


More than 0 10 20 30 40 50 


No. of students 50 46 40 20 10 3 
Solution: 
Here cumulative frequency distribution (more than) 15 given. 
Hence we shall convert it into simple frequency distribution. 
Class Frequency Cumulative Frequency 
Less than More than 
0-10 50 —-46 =44 50 
10-20 46 — 40 = 6 10 46 
20-30 40 — 20 = 20 30 40 
30—40 20 — 10 = 10 40 20 
40-50 10-3 =7 47 10 
50-60 3 50 3 
Here,m= = =25 
c.f. just greater than 25 is 30, hence 20—30 will be median class. 


(Ze) 


Median=L4+ 7 


m=20+ ~s =20+ 7 =20+7.5= 27.5 


50 x 60 


- = 30 
P 60= ™ 
In-Ly 
— Hit {Peo —¢) 
Peer 
P = 20+ 2° (80-10) = 20 + 30x20 = 30 
60 : : 


Illustration 11. 
Find out median from the following table : 


Marks No. of Students 
Less than 10 15 

Less than 20 35 

Less than 30 60 

Less than 40 84 

Less than 50 96 

Less than 60 127 

Less than 70 198 

Less than 80 250 


Solution : 
Here cumulative frequency (less than) is given. Hence we shall 
convert it into simple frequency distribution. 
Class Frequency Cumulative Frequency 
0-10 15 15 
10-20 35-15 = 20 35 
20-30 60-35 = 25 60 
30—40 84-60 = 24 84 
40-50 96-84 = 12 96 
50-60 127-96 = 31 127 
60—70 198-127 = 71 198 
70-80 250-198 = 52 250 


z= 2 = 125 
Now c.f. greater than 125 is 127. So the class against it, i.e., 50— 


60 is the median class. 


[e-col'e 


Median=L 4+ 7 


(125 — 96)10 


=50+ = =50+ 1 =50+9.35 = 59.35 
Illustration 12. 


Let N = 100, and there are 9 classes of common difference. First class is 10— 
20. The cumulative frequencies of 5th, 6th, 7th and 8th class are 45, 70, 90 and 99 


respectively. Find median. 
(Indore 2004) 
Solution : 

Here N = 100. Total number of classes = 9 width of all the classes 
is equal, /.e., 10 because first class is 10—20. So the table will be as 
follows : 

Class Frequency Cumulative Frequency 
10-20 

20-30 

30—40 

40-50 

50-60 45 

60-70 25 70 

70-80 20 90 

80-90 9 99 

90-100 1 100 


N 100 
2 


z = 2 = 50 and 


c.f. greater than 50 is 70. So 60—70 will be median class. Now 
(J-e)i 
; 


Median = L 4 + 
= 60 + 
=60+ » =60+2=62 


Illustration 13. 
Find the median from the following frequency table : 


Marks No. of students 
0-10 15 

10-30 25 

30-60 30 


60-70 4 
70-90 10 


Solution : 

Adjustment of frequencies is not essential to find the median. 
Class Frequency Cumulative Frequency 

0-10 15 15 

10-30 25 40 

30-60 30 70 

60-70 4 74 

70-90 10 84 


229 242 
and c.f. greater than 42 is 70 so 30-60 will be median class. Now 


ye)i 


f 


Median = L 4 + 
mt 30 + eae 
=30+ » =30+2= 32 


Illustration 14. 
Calculate median from the following data : 


Variable Frequency 
Less than 55 

Less than 10 8 
15-20 8 

20 and above 20 44 
20-30 9 

30 and above 30 11 
35 and above 35 5 


Solution : 
XC, 
less than 5 =0-555 
f less than 10 is 8 so 5-10 8-5 = 38 
15-20 8 16 
f of 20 and more is 44 so 20—25 (44—9-11) = 24 40 
25-30 9 49 


f of 30 and above is 11 so 30-35 (11-5) = 6 55 
35—40 5 60 


N =60 
Calcualtion of median class m = >~» = 30 
30 first comes in c.f. 40 so 20—25 will be median class 
Hence L 4 = 20, L 9 = 25, f= 24, c= 16 


ly 


Calculation of Median M=L 4+ 7 (m-c) 


25-20 


=20+ »« (30-—16)=20+ » x 14 = 22.92 
Hence Median will be 22.92 


Illustration 15. 
From the following data, find median : 


Wages more than 20 30 40 50 60 70 80 


No. of persons 80 74 64 48 30 18 8 
Solution: 


C.l. Actual Frequency c.f. 
xf 


80-90 (8-0) = 8 80 
N=2 f=80 

m = Size of = th item 

= Size of = th item = Size of 40th item 

40 first comes in c.f. 50. So the class against it, .e., 50-60 will be 
median class. Hence 

L 4 =50,L 9 =60, f= 18, c = 32 


Median M=L 4+ 7 (m-c) 
Substituting the values in the formula, 
~=50+4.44 = 54.44 


60-50 


M=50+ « (40-32)=50+ = 


Illustration 16. 
Calculate the Mean and Median from the following data : 


Marks Frequency 

10-25 6 

25-40 20 

40-55 44 

55-70 26 

70-85 3 

85-100 1 

(Ujjain 2004) 

Solution : 


Marks mv f A = 47.5 fdx c.f. 
dx 

10-25 17.5 6 — 30 — 180 6 
25—40 32.5 20 — 15 — 300 26 
40-55 47.5 4400 70 

55-70 62.5 26 15 390 96 
70-85 77.5 3 30 90 99 
85-100 92.5 1 45 45 100 

f= 100 1 fdx = 45 

(i) Mean * = ay 

= 47.5 + iv 

= 47.95 marks. 

(ii) Median m = size of > thitem 


= size of » th item 
= size of 50th item 
c.f. just greater than 50 is 70 hence 40—45 will be median group. 


-L, 


bb (m—c) 
Hence M=** 7 


= 10+ 50-26) 


15 
40+—"x24 
44 


48.18 marks. 


Exercise 7 (B) 


1. Find median for the following : 
20, 18, 22, 27, 25, 12, 15. 


[ Ans. : 20] 


2. From the following data, find median : 
100, 80, 150, 90, 160, 140, 200, 50, 180, 170. 


[ Ans. : Median = 145] 
3. Find median from the following table : 
No. of children 123456789 
No. of families 39 152115121085 
[ Ans. : Median = 5] 
4. Find median from the following table : 
Marks obtained 0—10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 
No. of students 4 96 13 27 21 128 
[ Ans. : Median = 46.67] 
5. Find median from the following table : 
Marks obtained 30.5—39.5 40.5-49.5 50.5-59.5 60.5-69.5 70.5-—79.5 80.5— 
89.5 
No. of students 5 22 63 74 306 
[ Ans. : 61.35] 
[ Hint : Find median by subtracting 0.5 from lower limit and adding 0.5 to upper 
limit and thus converting the given series into exclusive series. ] 
6. Find median from the frequency distribution of marks of Economics : 


Marks 30-35 25-30 20—25 15-20 10-15 5-10 0-5 


No. of students 4 8 12 16 106 4 

[ Ans. : M = 18.125] (Sagar 2004) 

7. Monthly income of some families are given in the following table. Median is * 
75. Find the missing frequency : 

Monthly income of families (in © ) 0-100 100-200 200-300 300-400 400-500 

500-600 

Number of families 400 X 50 30 155 

[ Ans. : 100] 

8. An incomplete frequency distribution is given below : 

Height 5.1-6.0 6.1—7.0 7.1-8.0 8.1-9.0 9.1—10.0 10.1-11.0 11.1—12.0 


No. of Plants 38 27 X 17119 
It is well known that median height of plants is 8.53 inch. Find the missing 
frequency. 


[ Ans. : 25] 

[ Hint : Class intervals are inclusive. Convert it into exclusive series by subracting 
0.05 from lower limits and adding 0.05 to upper limits. Then find the value of 
X by using the formula of median. 


9. From the following data calculate median : 
Value less than 10 20 30 40 50 60 70 80 
Frequency 4 16 40 76 96 112 120 125 

[ Ans. : M = 36.25] 


10. Correct the following table and find median : 
Size Frequency Size Frequency 

10-15 10 30-35 28 

15-17.5 15 35-40 30 

17.5-20 17 above 40 40 

20-30 25 


[ Ans. : 32.7] 


11. 5 students failed in a test from a batch of 15 students. Marks obtained by ten 
students are 9, 6, 7, 8, 8, 9, 6, 5, 4 and 7. What is the median of marks obtained 
by all students ? 

[ Ans. : 6] 


12. An incomplete frequency distribution is given below : 
Variables Frequency 

10-20 12 

20-30 30 

30-40 ? 

40-50 65 

50-60 ? 

60-70 25 

70-80 18 


Total 229 
You have given that the value of median is 46. Find the missing frequencies : 


[ Ans. : 34, 45] 
To Find Mean and Median 


13. Calculate mean and median from the following data : 
Marks 32 45 62 75 80 
No. of students 8 15 104 2 
[ Ans.: = 51.56 marks, M = 45 marks] 
14. From the following series, find Mean and Median : 
Value 10 15 12 20 16 30 
Frequency 8 14 109 25 4 
[ Ans.: = 15.86, M = 16] 
15. From the following data calculate Arithmetic mean and median : 
Marks 5—10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 
No. of students 6515105422 
[ Ans.: = 20.87,M = 19.5] 


16. Calculate Mean and Median from the following data : 
Mid—value 10 20 30 40 50 60 
Frequency 69251875 
[ Ans.: =33, M = 33.71] 
17. Calculate Mean and Median from the following data : 
Salary (less than ~ ) 50 70 90 110 130 150 
No. of workers 30 46 65 85 95 100 
[Ans.: = 75.8, M =~ 74.216] 
18. Calculate Mean (by short-cut method) and Median expenditure from the 
following data : 
Expenditure (below © )5 10 15 20 25 
No. of students 6 16 28 38 46 
[ Ans.: =~ 12.93, M =~ 12.92] (Jabalpur 2005) 

19 Calculate Median and Mean from the following series : 

Marks more than 0 10 20 30 40 50 60 70 80 

No. of Students 240 215 200 180 165 145 115 50 0 

[ Ans.: =49.58 marks; M = 58.33 maks] (Indore 2003) 

20. Assume N = 100 and there are 9 class—intervals all of equal size. The first 
class—interval is 10 and under 20. The cumulative frequency of the 5th, 6th, 7th 
and 8th class—intervals are 45, 70, 90 and 99 respectively. Find median. 

[ Ans. : M = 62] 


MODE 


Mode is a positional average like median. Most of the items in a 
series concentrate around the mode the word. Mode is derived from 
French Word ‘La mode’, which means tradition or fashion. Mode is 
also used in the same sense in statistical analysis. For example, the 
mode of size of shoes in India is 7. Therefore, manufacturer 
produces maximum shoes of size 7 in each design. So mode is the 
item which occurs most frequently in the series. In general words it is 


said that item having maximum frequency in a series is called mode. 
The symbol Z is used for it in statistics. 


Definitions 

According to Kenny and Keeping , “The value of the variable 
which occurs most frequently in a distribution is called the mode.” 

According to Boddington , “Mode may be defined as the pre- 
dominant kind, type or size of item or the position of greatest 
density.” 

According to Riggleman, “Mode is the point of the greatest 
frequency of occurrence. It is the most common value .” Thus, mode 
is the value of that item which possesses the maximum frequency. It 
is the point of maximum density. 


COMPUTATION PROCEDURE OF MODE 


(a) Calculation of Mode in Individual Series : There are 
following three methods to calculated mode : 

(1) By inspection. 

(2) By converting individual series into discrete or continuous 
series. 

(3) On the basis of median and arithmetic mean. 

(1) Calculation of mode by inspection : According to this 
method it is seen that which item has occurred maximum number of 
times. If the item occurring maximum number of times is seen clearly 
then that will be mode. 


Illustration 1. 

From the following data, find mode : 

5, 8, 8, 10, 5, 4, 3, 2, 8, 5, 4, 8, 10, 8, 5. 

In the above series frequency of 5 is 4, frequency of 8 is 5, 
frequency of 10 is 4 and frequency of 4 is 2. It is clearly seen that 8 
has occurred maximum number of times. Hence mode = 4. 


(2) By converting individual series into discrete’ or 
continuous series : If the number of items is more in the given 
series then mode is calculated by converting individual series into 
discrete or continuous series. Its clear description is as follows : 

(i) By converting into discrete series : For it; by writing different 
items in series continuously, their frequencies are determined. Thus 
size of the series becomes small and mode is calculated in the 
discrete series. Its description is being given on coming pages of this 
chapter. 

(ii) By converting into continuous series : If the size of the 
series is huge or there is not tendency in the individual values to 
come more than one time. Then individual series is converted into 
continuous series. After converting into continuous series. The mode 
is calculated by prescribed formula. Its description is given on 
coming pages of this chapter. 

(3) With the help of median and mean : If the clear value of the 
mode is not obtained by the above methods, then mode is calculated 
with the help of median and mean. For it, median and mean in the 
series are calculated and then mode is calculated by the following 
formula : 

Mode = Z=3 M- x 

This method may be also used in discrete and continuous series. 

(b) Calculation of Mode in Discrete Series : 

Mode may be calculated by following two methods : 

(1) By inspection (2) By grouping method 

(1) By inspection : If the frequency distribution is regular, then 
mode may be calculated easily by inspection. Here the item, which 
has maximum frequency, is mode Regular frequency distribution 
means frequencies should reach at a maximum point by increasing 


regularly and later decrease continuously. If frequencies decrease 
for sometimes and increase for sometimes, then this method should 
not be used. Generally grouping is considered as the best method in 
each condition. 

(2) Calculation of mode by grouping method : When frequency 
distribution in a series is not regular or maximum frequency is 
coming on two or more places, then mode is determined by grouping 


method the method of grouping is as follows : 

First a table is prepared having 7 columns. The items in the series 
are recorded in its first column and corresponding frequencies are 
written in second column to whom we call first column of grouping. 
We do the grouping in last 5 columns as follows : 


S econd column : Frequencies are grouped by two’s beginning 
from first. 

Third column : Frequencies are grouped by two’s leaving the 
first. 

Fourth column : Frequencies are grouped in three’s starting from 
first. 

Fifth column : Frequencies are grouped in three’s leaving the 
first. 

Sixth column : Frequencies are grouped in three’s leaving first 
two. . 

Construction of Analysis Table : In analysis table all the items 
are written in first row in such a way that there should be one item in 
each column. Now six rows are drawn below respectively to write 
known maximum frequencies from six columns of above grouping. 
We put a mark ( () ) against the item which has maximum frequency 


in column first. After it we put marks ( [) ) against the items whose 
frequencies are contained in maximum total of column second. 


Similarly we put marks against the items having maximum 
frequencies or all columns respectively. At last we find separate total 
of marks ( (J ) in all columns. The item, which has maximum total, is 
mode following is the frame of analysis table : 


Col. No. | Size of items containing maximum frequency 
Meaurement — 


OoRWDN = 


Total — 


Illustration 2. 
Calculate mode of the following distribution : 


Size 14151617 18 19 20 21 22 23 24 25 
Frequency 2 26 8 31 22 15316321 
Solution : 


Since two items have maximum frequency 31 in this Question, 
hence it is essential here to use grouping method. 


Grouping Table 


Size Frequency Total of Total of 2’s Total of Total of 3’s Total of 
3’s 

two’s leaving first 3’s leaving first leaving two 

123456 

1424—— — 

15 2 10 — 

166 148 16 


17 8 39 45 
18 31 53 


19 22 37 61 68 68 
20 15 46 


21 31 37 52 
226940 
2335 11 
24236— 
25 1—— — 


To analyse the results obtained from grouping, analysis table will 
be constructed as 
follows : 
Analysis Table 


Items having maximum frequency 
Item _| 14 15 16 17 18 19 20 21 22 23 24 25 
Col. N. 
1vv 
2V\ 
3VV 
avy 
BV 
6V\ 


Total 25422 


Here the item against the maximum frequency 5 is 18. Hence 
mode will be 18. 


(c) Calculation of Mode in Continuous Series : For calculation 
of mode in continuous series, we have to calculate first modal class. 


For it we use two methods like the discrete series—First by inspection 
and second by grouping. If frequency distribution is regular then the 
modal class is the class having maximum frequency. But if the 
frequency distribution is irregular, it is more suitable to use grouping 


method. We use grouping method in continuous series in the same 
way as we have used it in the case of deserter series. After finding 
modal class we use the following formula to calculate mode : 

(1) Mode Z=L 4+ +(Lo=—L4) 
or (2) Z=L9- (CO) 

where, Z = Mode 

L 4 = Lower limit of modal class 

L 5 = Upper limit of modal class 

f 4 = Frequency of modal class 

f go = Frequency of the class preceding the modal class 

f 5 = Frequency of the class succeeding the modal class 


While modal class is determined by using grouping method, it is 
necessary to note the point of the time of finding mode with the help 
of above formula that the value of mode for sometimes comes out 
from the limit of modal class. In such situation, any one of the 
following formula is used: 


(1)Z=Ly+e(Lg-L4)(Q)Z=Lg- (Lg-L4) 
(3)Z=L4+ xi (4)Z=L9- x j 

where, A 4 =| f 4 —fg| (Neglecting + and — signs) 
A9=|f4-f9| (Neglacting + and — signs) 

Note : Formula (3) and (4) may be also used in general condition. 


Illustration 3. 
From the following data, calculate mode : 


Size 15-30 30-45 45-60 60-75 75-90 90-105 
Frequency 15 20 44 26 13 10 


Solution : 


It is detrmined by inspection that frequency distribution is regular. 
Therefore, the class having maximum frequency that is 45-60 will be 
modal class. 

We shall use the following for the determination of mode : 


Z=L4+%~e(Lo-L4) 

According to Question 

L 4 =45,L 9 =60, f4 =44, fg = 20, fo = 26 

Substituting the value in the formula, 

Z =45 + m0 (60-45) 

=45 +» x 15=45+ 8.57 = 53.57 
Illustration 4. 

Calculate mode from the following data : 

Class 0-4 4-8 8-12 12-16 16-20 20-24 24-28 28-32 

Frequency 75917151436 


Solution : 
Grouping Table 
Item Frequency Two’s from Two’s 3’s from 3’s leaving 3’s 
leaving 
beginning leaving first beginning first first two 
123456 
0-4 7 12 — 21 —— 
4-8 5 14 — 
8-12 9 26 31 
12-16 17 32 41 


16-20 15 29 46 32 
20-24 14 23 
24-28 3917 
28-32 6 
Analysis Table 


Item 0-4 4—8 8-12 12-16 16—20 20-24 24—28 28-32 
Col. No. 


1 
2V\ 
3VV 
4Vv\ 
5VVV 
6VVV 


Total 14 5 3 1 

So class having maximum frequency 5 is 16—20. It is the modal 
Class. 

Now, Z=L 4 + (Lo-L4) 

Taking 60—20 modal class, we have : 

L4=16,L 9 =20,f4=15,f9=17,f9=14 

Substituting the value in the formula 

Z=16 + w-1-u (20-16) 

=16+ :x4=16+8=24 

Since 24 comes out side of the modal class (16—20) hence using 
the alternative formula, 

Z=Ly+ (Lo-L14) 


=16+  (20-16)=16+ « =16+ 1.806 = 17.806 

Hence the value of mode will be 17.806. 
Merits, Demerits and Uses of Mode 

Merits : 

(1) Mode can be determined easily. 

(2) It can be easily located by mere inspection in certain cases. 

(3) It is the item having maximum frequency, hence it represents 
the as a whole. This quality is not in any of the average. 

(4) it can be located by graph also. 

(5) It is not affected by extreme values. 

(6) If it is to study the popularity of a commodity, then mode is very 
useful. 

Demerits : 


(1) Mode is used in business fore castings. 
(2) The meaning of average in business field is mode. 
(3) Mode is used to know the popularity of commodities. 


Illustration 5. 
Find mode of the following items : 
0, 1, 6, 7, 2, 3, 7, 6, 6, 2, 6, 0, 5, 6, 0. 


Solution : 
Arranging the data, 


Item 0123567 
Frequency 3121152 
Item 6 has the maximum frequency. 
. Mode Z =6 


Illustration 6. 
Find mode from the following table : 


Class Interval 0 — 20 20 — 40 40 — 60 60 — 80 80 — 100 
Frequency 271033 


Solution : 
The class 40-60 has the maximum frequency. 


-. Modal class is 40-60. 

Mode, Zp =L 4+ Woh 

where L 4 = Lower limit of modal class = 40 

f g = Frequency of the class preceding the modal class = 7 
f 4 = Frequency of the modal class = 10 


f 5 = Frequency of the class succeeding the modal class = 3 


| = Magnitude of the class interval = 60 — 40 = 20 
“» Mode, Z = 40 + 2x10 -7 =: 

=40 + wa 

= 40+ » =40+6= 46 


Illustration 7. 
Find out missing frequency X in the following distribution if the value of mode is 
22: 


Class Intervals 0-6 6—12 12-18 18-24 24-30 30-36 
Frequency 571117 X 6 


Solution : 
The value of mode is 22. 
-. Modal class is 18-24 
Mode = L 4 + 
where L 4 = 18,f4=17,fg9 =11,f9 =X,/=18-12=6 


(17—11)6 


> 22 = 18 + a7=1=x 


=> 22-18 = >4= 
=> 92-4 X =36 
=> 92-36=4xX 


=> 56=4xX.xX=14 
Illustration 8. 
Find out mode for the following distribution function : 
Marks 1—5 6—10 11-15 16—20 21-25 26-30 31-35 36-40 41-45 
No. of Students 7 10 16 32 248105 1 


Solution : 
First we shall convert the inclusive series into exclusive series. For 
it we subtract 0.5 from lower limit and add 0.5 to upper limit. 
Marks Frequency Marks Frequency 
0.5-5.5 7 20.5—25.5 24 
5.5-10.5 10 25.5—30.5 8 


10.5-15.5 16 30.5—-35.5 10 
35.5—40.5 5 
15.5-20.5 32 40.5—45.5 1 
Here maximum frequency is 32. Hence modal class is 15.5—20.5 


‘A = fot 


Mode, Z=L 4 + 2 -"-A 

where L 4 = 15.5, fg = 16, fy = 32, fo = 24, 7=20.5-—15.5=5 
So Mode, Z = 15.5 + w-i- 2 

Z=15.5 + eo =15.5+ » =15.5+3.3 = 18.8 


Illustration 9. 
Find out mode in the following table : 


Item value 0—5 5-10 10-15 15-20 20-25 25-30 30-35 
Frequency 121041092 
Solution: 


In the above table two classes have the maximum frequency 10. 
So grouping method will be used to determine the modal class. 


Grouping Table 
Item-value Grouping of frequencies 
123456 
0-5 1 
3 13 
5-10 2 
12 
10-15 10 


14 16 24 
15-20 4 
14 23 
20-25 10 
19 
25-30 9 
11 21 
30-35 2 


Analysis Tabel 


Column Class having maximum frequency 
Number 

10-15 15-20 20—25 25-30 30-35 

1 10-15 20-25 

2 20-25 25-30 

3 15-20 20-25 

4 15-20 20-25 25-30 

5 20-25 25-30 30-35 

6 10-15 15-20 20-25 


Total 2 3 6 3 1 


It is clear from the above analysis table that 20—25 is modal class 
because its frequency is maximum. 


Mode Z=L 4 + ts 

where L 4 = 20,f4=10,f9 =4,f9=9,/=5 
’. Mode Z = 20 + x -a-5 

=20+'=20+% 

= 20 + 4.29 = 24.29 


Illustration 10. 
Find out mode for the following frequency distribution : 
Class Frequency Class Frequency 
0-2 1 18-20 12 
2-3 2 20-22 10 
3-6 1 22-24 8 
6-8 4 24-29 11 
8-11 3 29-30 9 
11-12 5 30-34 6 
12-15 18 34-36 3 
15-18 7 
Solution: 
In this Question class intervals are not the same but the 
adjustment of frequencies is essential on the basis of equal class 
interval to find mode. Here width of class 24-29 is 5 while 


reconstruction of classes is not possible by taking width of classes 
as 5. Hence we reconstruct the classes by taking width of the 
classes as 6. 
Group Frequency 
0-61+2+1=4 
6-12 4+3+5=12 
12-18 18+ 7=25 
18-24 12+ 10+ 8=30 
24-30 11 +9 = 20 
30-36 6+3=9 
So it is clear that the class 18—24 is the modal class because its 
frequency maximum. Now 


Mode =Z=L 4+ 

where L 4 = 18, f9 = 25, f 4 =30, fo =20,/=6 
Thatis Z = 18+ 

=18+ 5 =18+2=20 


Illustration 11. 
Find out mode from the following table : 


Class 10—20 20-30 30-40 40-50 50-60 
Frequency 40 25 20 10 7 


Solution : 
Here class first has the maximum frequency. Hence 10-20 is the 
modal class. 


Z=L4+t 
where L 4 = 10, f9 =0, f 4 =40, fo = 25, /= 10 
Z=10+ Eee 


Z=10+ “S° =10+% 
= 10+7.27 = 17.27 


Illustration 12. 
Find mode from the following table : 


Wages (in dollar) O—-2 2-4 4-6 6-8 8-10 10-12 12-14 
Person 37810123 
(Indore 2003) 


Solution : 

Here frequency increases and attains the maximum frequency 10, 
then it decreases and again it increases. So grouping method will be 
used to determine the modal class. 

Class Frequency 
IW WIIV V VI 
0-2 310 


2-47 15 18 
4-6 8 18 25 


6-8 10 11 19 
8-10 13 13 
10-12 256 
12-14 3 


Analysis Table 


Column Class having maximum frequency 

0-2 2-4 4-6 6-8 8-10 10-12 12-14 

| 6-8 

Il 4-6 

Ill 2-4 4-6 6-8 

IV 0-2 2-4 4-6 

V 2-4 4-6 6-8 

VI 4-6 6-8 8-10 

Total 13541 
It is clear from the above table that 4—6 is the modal class. 
L4=4,fg=7.f4=8,f9=10,i=2 


A= fo? 


Now mode Z =L 4 + %-"- 


Uy 
h 


(8 - 7)2 
=4+ «w-7-1 


=A4e534-90= 9 


Thus mode = 2. It is outside of the modal class (4-6). In such 
situation we shall use the alternative formula to mode. 
Alternative Formula 


Z=L yt neni 
Here, f g , f 2 , have the same meanings as they have in the 


previous formula. 
Z=4+ x2=4+ 7 =4+1.176 = 5.176 


Illustration 13. 
Find mode from the following table : 

Marks No. of students 

0-105 

10-20 10 

20-30 15 

30—40 0 

40-50 10 

50-60 13 

60-70 7 

70-80 5 

80-90 3 

90-100 1 


Solution : 

Here the frequency of the class 30—40 is zero. Hence grouping 
method is not possible. In such situation the following formula will be 
helpful to find the mode. 

Z=3M-2 

Calcualtion of A.M. and Median 
Class Frequency Mid value “~*: fdx Cumulative 
(f)( x ) Frequency 
0-1055-4-205 
10-20 10 15-—3-3015 
20-30 15 25 — 2 — 30 30 
30-40 0 35-10 30 
40-50 10 45 =A00 40 


50-60 13 55 1 13 53 
60-70 7 65 2 14 60 
70-80 5 75 3 15 65 
80-90 3 85 4 12 68 
90-100 19555 69 


Total 69 — — - 21 — 

A.M.( )=A+ x 

=45 + °'s ” =45-— » =45-—3.04 = 41.96 

Median M=L4+ 7 

Here, =-= = 34.5 and c.f. greater than 34.5 is 40. So 40-50 is 
median class. Thus, L = 40, c = 30, f= 10, / = 10 

Hence M=40+ “ws =40+ “wo =40+45=44.5 


Now mode = Z=3 M-2 
=3 x 445-2 x 41.96 
= 133.5 — 83.92 = 49.58 
Illustration 14. 
Find mode from the following tables : 
Class 1-5 6-10 11-15 16—20 21-25 
Frequency 7 10 20 25 40 
Solution : 


In this Question, classes are in inclusive series. We shall convert it 
into exclusive series to find mode. 
Class (Exclusive) Frequency 
0.5-5.5 7 
5.5-10.5 10 
10.5-15.5 20 
15.5-20.5 25 
20.5—25.5 40 


Here last class has the maximum frequency 40. So this last class, 
i.e. , 20.5—25.5 is the modal class. Thus, 


L 4 =20.5, f9 =25,f 4 =40, f9=0,i=5 
Mode Z=L 4 + % 8% 

= 20.5 + 

= 20.5 + “3° 

=20.5+ 

= 20.5 + 1.36 = 20.86 


Illustration 15. 
Find mode from the following table : 


Mid value 15 25 35 45 55 65 75 85 
Frequency 5913 21 20 1583 
(Rewa 2004) 
Solution : 
Here mid values of the classes are given. Width of class is 10. So 
» = 5 will be subtracted from mid values and 5 will be added to mid 
values to find the limits of the classes. Here maximum frequency 21 
but there is a big. 


Class 10—20 20-30 30—40 40-50 50-60 60-70 70-80 80-90 


Frequency 59 13 21 20 158 3 
Class Frequency 
(i) (ii) (iil) (iv) (v) (vi) 
10—20 5 
20-30 9 14 27 
30-40 13 22 
40-50 21 34 43 
50-60 20 41 56 54 
60-70 15 35 
70-80 8 23 43 26 
80-90 3 11 


Analysis Table 


Column Class having maximum frequency 
10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 


Total 12553 1 


It is seen from above frequency table that two classes 40-50 and 
50-60 have maximum frequency 5 each. Hence we will test the 
frequency density to find the modal class. 

Modal Class f ¢ f 4 f 9 Total 
40-50 13 21 20 54 
50-60 21 20 15 56 
Thus we can say that 50-60 will be modal class. 
Now mode Z=L 4 + a6 
= 50 + was = 50-7 =50-2.5 = 47.5 
which is outside the modal class. So alternative formula will be 
used. 
Alternative Formula 


bs ; 
Z=L yt on xy 


1 


=50+ 20 =50+ ws =50+4.17 = 54.17 


Illustration 16. 
Find out the mode and median from the following Table : 


No. of days absent (less than) 5 10 15 20 25 30 35 40 45 
Frequency 29 224 465 582 634 644 650 653 655 


Solution : 
It is less than type series. So first we shall convert it into 
continuous series. 


Grouping Table 
No. of days Real Frequency 
absent f23456 c.f. 
0-5 (29-0) = 29 224 —--—- 29 
5-10 (224-29) = 195 436 465 — 224 
10-15 (465-224) = 241 358 553 465 
15-20 (582-465) = 117 169 410 582 
20-25 (634-582) = 52 62 179 634 
25-30 (644-634) = 10 16 68 644 
30-35 (650-644) = 6 9 19 650 
35—40 (653-650) = 3 5 11 — 653 
40-50 (655-653) = 2 — 655 
2 f =655 
Analysis Table 

Item Item having maximum frequency 
Column 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 
1 
2V\ 
3VV 
avi 
5 VV) 
6 VV) 
Total 136 3 1 

So 10-15 will be modal class 

NowZ=L4+%i%(Lo-L1) 

Z = 10 + sent (15-10) 

Z =10+ 1.35 = 11.35 days 

Calculation of Median 

Median class m = Size of = th item 


655 


= Size of = thitem 
= Size of 327.5 th item 


327.5 first comes in c.f. 465. So 10-15 will be median class. 
Now M=L 4+ (m-—c) 

> M=10+ ‘0 (327.5—224) > M=104 oon 

= M=10+2.15 

M = 12.15 days 


Illustration 17. 
Calculate median and mode from the following data : 


Income (in © ) above 0 10 20 30 40 50 60 70 
No. of Persons 100 87 73 60 44 20 105 


Solution : 


Income (1) Frequency ( f ) (2) (3) (4) (5) (6) c.f. 
0-10 (100-87) = 13 27 13 
10-20 (87-73) = 14 27 40 27 
20-30 (73-60) = 13 29 43 40 
30—40 (60-44) = 16 40 53 56 
40-50 (44-20) = 24 34 50 80 
50-60 (20-10) = 10 15 39 90 
60—70 (10-5) = 5 10 — 20 95 
70-80 (5-0) = 5 — 100 
2 f = 100 
Analysis Table 
Item Item having maximum frequency 
Column 0-10 10—20 20-30 30—40 40-50 50-60 60-70 70-80 
1V 
2VV 
3VV 


AVVV 
BV VV 
6VVV 
Total 12452 


Since 40-50 has the maximum frequency hence it is the modal 
class 

Applying the formula 

Mode Z=L 4+ %»e(Lo-Ly) 


Z2=40+ eo (50-40) =40+ » x 10 =~ 43.64 
Calculation of Median 

Median Class m = Size of = th item = Size of = th item 
= Size of 50th item 


50 comes c.f. 56. So 30—40 will be median class 


i Pee i 


MedianM=L4+ 7 (m-c) 


40-30 10x10 


=30+ w« (50-40)=30+ « = 36.25 
So mode is © 43.64 and median is © 36.25. 


Illustration 18. 
Find arithmetic mean, median and mode from the following data : 


Marks (obtained below) 80 70 60 50 40 30 20 10 
No. of Students 100 90 80 60 32 20 135 


Solution : 
Marks No. of Marks No. of 


(less than) Students (less than) Students 
80 100 105 

70 90 Arranging the 20 13 

60 80 series in ascending 30 20 

50 60 order 40 32 

40 32 50 60 

30 20 60 80 

20 13 70 90 


10 5 80 100 
Calculation of mean and median by converting the less than type 

series into continuous series : 

Marks Mid value Frequency A = 40 /=5 cf. 
(x )( Ff) dx ( dx ) fox 

0-10 5 (5-0) =5-—35-—7-355 

10-20 15 (13-5) = 8- 25-—5-40 13 

20-30 25 (20-13) = 7-15-—3- 21 20 

30—40 35 (32-20) = 12-5-1- 12 32 

40-50 45 (60-32) = 28 5 1 28 60 

50-60 55 (80-60) = 20 15 3 60 80 

60—70 65 (90-80) = 10 25 5 50 90 

70-80 75 (100-90) = 10 35 7 70 100 


N = 100 = fdx = 100 

Mean =A+~< xi Medianclass m = Size of = th item 
= 40+ iw x 5=Sizeof = thitem 

= 45 Marks. = Size of 50th item 

50 first comes in c.f. 60 so median class is 

= 40-50 

Median M=L 4+ 7’ (m-—c) 

=40+ » (50-32) 

=40+ 2 x 18 

= 40 + 6.43 = 46.43 Marks. 


Calculation of Modal 

By inspection the frequencies of series are regular and there is 
sufficient difference among the points around the heighest point. 
Hence the modal class of heighest frequency is 40—50. 


Mode Z=L 4+ ¥%%%@(Lo-L4) 
Substituting the values in the formula 

Z = 40+ «-2-» (50-40)=40+ % x 10 
= 40 + 6.67 = 46.67 Marks. 


YD aan 


Hence mean =45 marks median m = 46.43 marks and mode Z 
= 46.67 marks. 
Illustration 19. 
Calculate mode, median and mean : 
Below 357 10 13 15 20 25 30 
Number 2 5 10 15 30 35 42 47 50 
Solution: 
Since the difference between items is unequal hence we shall first 
construct the series by making the classes of equal difference. 
Item Frequency Mid value A = 12.5 
(f ) x dx fdx c.f. 
0-—-5525-—-10-—505 
5 — 10 (15-5) = 10 7.5-—5-50 15 
10 — 15 (35-15) = 20 12.50 0 35 
15 — 20 (42-35) = 717.55 35 42 
20 — 25 (47-42) = 5 2 2.5 10 50 47 
25 — 30 (50-47) = 32 7.5 15 45 50 
N=2 f=502 fdx = 30 
Mean =A+~* Median Class m = Size of = th item 
= 12.5+ » = Size of 25th item 
= 12.5 + 0.6 = 13.1 25 first comes in c.f. 35 so median class = 10 — 
15 
So mean = 13.1 Median M=L 4+ 7 (m-—c) 
=10+ » (25-15)=10+ » =10+25=125 
Hence Median = 12.5 
Calculation of Mode : 
Frequency distribution in the series is perfectly regular hence the 
Class against the highest frequency, /.e., 10—15 will be modal class. 


Mode Z=L 4+ "1% (Lo-L1) 


20-10 10x5 


=10+ wm7 (15—10)=10+ “= = 12.17 
So =13.1,M=12.5 and Z = 12.17. 


Illustration 20. 

There are 100 students in a class. Their marks have been tubulated in a 
frequency distribution having seven class intervals of equal size, the first class 
interval is ‘O—-10’. The cumulative frequencies of the 4th, 5th and 6th class intervals 
are 35, 65 and 85 respectively. Calculate the median. 

Also find the mode and mean if the frequency of the second class is double of 
the first class but equal to the third class and the frequency of the fourth class is 
one-third of the fifth class. 

Solution: 

Arranging the frequency distribution and cumulative frequency in a 

table according to the Question. 


m 0-10 10-20 20-30 30-40 40-50 50-60 60-70 
f i f= 100 

c.f. 35 65 85 100 

Since total number of students is 100 hence cumulative frequency 
of the last class will be 100 and total of the frequencies 0 f will be 
also 100. From the difference of c.f. the frequencies of the classes 
40-50, 50-60 and 60-70 will be 65 — 35 = 30, 85 — 65 = 20 and 60 — 
70 = 10 will be 65-35 = 30, 85 — 65 = 20 and 100 — 85 = 15 
respectively 


Let the frequency of the first class be x. So according to Question 
the frequency of class 10-20 will be 2 x and the frequency of the 
class 20-30 will be also 2 x. Since the frequency of 5th class is 30 
(given). Hence the frequency of 4th class will be : = 10. This can be 
arranged in the table as follows : 

m 0-10 10—20 20-30 30—40 40-50 50-60 60—70 (i f = 100 
fx 2x 2x 10302015 


So x + 2x+2x+10+ 30+ 20+ 15 = 100 


5 x + 75 = 100 
5 x =25 
x=5 
So frequency of 0-10 will be 5, frequency of 10—20 will be 10 and 
frequency of 20-30 will be 10. Preparing the table again by above 
calculation. 
mfc.f. mv dx fdx 
A =35 
0-10555- 30-150 
10-20 10 15 15 — 20 — 200 
20-30 10 25 25 — 10 — 200 
30-40 10 35 3500 
40-50 30 65 45 10 300 
50-60 20 85 55 20 400 
60-70 15 100 65 30 400 


100 || fdx = 600 


(1) Calculation of median : 
m= size of = thitem 
= size of 50th item 


50 comes first in c.f. 65 so 45 — 50 will be median class. Hence. 
Now M=**7' (m-c) 
M = 40 + 0% (50-35) 


, 10X15 _ 


= ©*"5 -* marks 
(2) Calculation of mean : 


— 354 sd =41 marks 


(3) Calculation of mede : 

By inspection, modal class will be 40—50 
fi-fo 

L= ane 


30-10 = 
am 404+ (50-45) 
Z _ 60-10-20 


= 40.1 = x10 = 46.67 marks 
Exercise 7 (C) 


1. Calculate mode for the following data : 
25, 15, 23, 40, 27, 25, 23, 25, 20. 


[ Ans. : Z = 25] 

2. Calculate mode from the following data of marks obtained by 10 students : 
S.No. 12345678910 
Marks obtained 10 27 24 12 27 27 20 18 15 30 

[ Ans. : Z = 27] 

3. Find out mode from the following table : 
Size of item 56 7 89 10 11 12 
Frequency 4681091087 

[ Ans. : Z = 9] 

4. The following data are about the size of shoes sold at a shop : 
Size of shoes 55.56 6.57 7.58 8.5 


No. of pairs 12 15 20 40 60 38 22 10 
Find suitable average. 


[ Ans.: Z = 7] 

5. Find out the mode from the following series : 
Size of item 1234567 89 10 11 
Frequency 2 17 13 15 20 25 23 24 20 23 15 

[ Ans.: Z = 7] 

6. Find out mode from the following series : 
Wages 50-60 60-70 70-80 80-90 90-100 
No. of workers 11 17 32 28 12 
[ Ans. : Z = 77.89] 


7. Find Mean, Median and Mode from the following distribution : 


Marks Frequency Marks Frequency 
10-25 6 55-70 26 

25—40 20 70-85 3 

40-55 44 85-100 1 


[ Ans.: =47.95, M = 48.35, Z = 48.57] (Vikram 2004; Jiwaji 2005 ) 
8. A company received various orders of the following size and quantity. Find the 
mode order : 
Size Number 
2 and under 4 9 
4 and under 6 27 
6 and under 8 45 
8 and under 10 54 
10 and under 12 42 
12 and under 14 30 
14 and under 16 9 
[ Ans. : Z = 8.86] 
9. Change the following data into simple frequency distribution and find out mode : 
The marks of 2 students less than 4, marks of 5 students less than 8, marks of 13 
students less than 12, marks of 16 students less than 16, marks of 20 students 
less than 20. (Bilaspur 2006 ) 
[Ans. : Z = 10] 
10. Calculate mode from the following series : 
Wages (in ° ) No. of workers 
below 30 3 
30-40 5 
40-50 12 
50-60 20 
60-70 10 
above 70 4 
[ Ans. : Mode = 54.44] 


11. Find out the mode from the following data : 


Size of item 0-4 4-8 8-12 12-16 16-20 20-24 24-28 28-32 32-36 36—40 


Frequency 5791715146310 
[ Ans. : Z = 17.8] 
[ Hint : The answer obtained by general formula comes outside of Z -class so we 


shall find the value by the formula, Z = L 4+ x [| 


12. The table below gives the distribution of age and marriage of males in a 
certain country. Find out the 
modal age : 
Age in years less than 21 21-25 25-30 30-35 35—40 40-55 above 55 
No. of males 136 979 1183 378 222 198 97 
[ Ans. : Z = 26.01] 
13. Find mode from the following table : 
Marks obtained 1—5 6—10 11-15 16—20 21—25 26-30 31-35 36-40 41-45 
No. of students 7 10 16 32 24181051 
[ Ans. : Z = 18.83] 
14. The following tables gives the distribution of male population of certain area in 
India. Find modal age: 
Age group 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 
Males 1677 2124 2756 1481 1021 610 245 67 16 3 
[ Ans. : Z = 22.81 years] 
15. Calculate the mode from the following frequency distribution : 
Age 10-19 20-29 30-39 40-49 50-59 60-69 
Frequency 6 15 12 106 3 
[ Ans. : Z = 35.5] 
16. From the following data, determine the mode : 
Marks 10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26 Total 
No. of students 2 6 20 32 33 17 8 2= 120 


[ Ans. : Z = 18.12] 
17. Find out median and mode from the following data : 

Wages 0-50 50-100 100-150 150-200 200-250 

No. of workers 108 12155 

[ Ans. : M = 129.17, Z = 161.54] (Bilaspur 2005) 

18. Find out mean, median and mode from the following series : 

Mid value 5 15 25 35 45 55 

Frequency 4610732 
[ Ans. : A.M. = 26.56, Median = 26 and Mode = 25] 
19. Calculate mode from the following frequency distribution : 

Mid-point 123456 

Frequency 2410553 

[ Ans. : Z = 3.25] 

20. Calculate mode from the following table : 

Mid-value 5 15 25 35 45 55 65 75 

Frequency 15 20 25 24 12 31 71 52 
[ Ans. : Z = 66.78] 
21. From the following series calculate Mean, Median and Mode : 

Value in © 10-20 10-30 10-40 10-50 10-60 10—70 10-80 10-90 

Frequency 4 16 56 97 124 137 146 150 
[ Ans.: = 46.33, M= 44.63, Z = 40.67] (Indore 2005) 
22. Find out mode, mean and median from the following data : 

Marks less than 10 20 30 40 50 60 70 80 

No. of students 15 35 60 84 96 127 198 250 
[ Ans. : Mode = 66.78, A.M. = 50.4, Median = 59.35] (Indore 2009; Bilaspur 
2006) 


23. Calculate mode from the following data : 
Marks above 0 10 20 30 40 50 60 70 80 90 100 
No. of students 80 77 22 65 55 43 28 16 10 80 
[ Ans. : Z = 55 marks] 


24. From the following data, find Mode, Median and Mean : 
Marks obtained No. of Students 

below 5 7 

below 10 20 

5-15 38 

15 and above 55 

20-25 20 

25 and above 5 

30 and above 1 


[ Ans. : Mode = 16.67, Median = 15.83, Mean = 15.45] 
25. From the following distribution find missing frequency ‘ f ’ if the value of Mode 
is 41: 
Class interval 0—10 10-20 20-30 30-40 40-50 50-60 
Frequency 29 12 f 167 
[ Ans. : f = 15] 
26. From the following incomplete distribution, find the missing frequencies : 
Class-interval 0-10 10—20 20-30 30-40 40-50 


Frequency 3 ? 20 12? 
The value of median and mode are 27 and 26 respectively. 


[ Ans. : f = 8 and 7] 

27. Find the missing frequencies if mode is 24 : 
Expenditure 0—10 10-20 20-30 30-40 40-50 Total 
No. of families 14 — 27 — 15 100 

[ Ans. : 23, 21] 


28. Find median and mode for the first twelve natural number 


[Ans.: M=6.5,Z=3M-2 =6.5] 
GEOMETRIC MEAN 
The geometric mean of n observations is the n th root of their 
product. Thus, it is obtained by multiplying together all the values 
and then extracting the relevant root of the product, it is denoted by 
G or G.M. 
(1)X4,X9,...,X 7 by the N values of the variate then 


CMO oe 
= log G.M. =: (logX 4 +logxo+...t+logxXxp,)=*2 log x 


.. G.M. = Antilog 
(2) 1X 4,X9,.., X py be the values of the variate and 
corresponding frequency be f4 ,f9,..., f 7 respectively, then 


G.M.="" "| where N= f 
=logGM.= (f4logx4,+fologxot...+fplogxy,) 
= 2 flog x 
«. G.M. = Antilog 
Computation Method of Geometric Mean 
(a) In Individual Series : The following is the method to find the 


geometric mean in descrete series : 

(i) First we take the logarithm of all values separately. For this find 
the characteristic from the observation and mantissa from the 
logarithm table. 

(ii) We add all the logarithm obtained from above ( 2 log x ) 

(iii) We calculate geometric mean with the help of the following 
formula : 

G.M. = Antilog of "| 

That is we look the antilog of the value obtained by dividing the 
sum of the logarithm by the number of items. For it we use the table 


of anti-logarithm the value obtained from anti-logarithm table is our 
geometric mean. 


Illustration 1. 
The monthly income of 10 families in a locality is as follows. Find the Geometric 
Mean : 


Family ABCDEFGHIJ 

Income (in © ) 85 70 15 75 500 8 45 250 40 36 
Solution : 
Family Income (in ° ) (log x ) 


x 

A 85 1.9294 

B 70 1.8451 G.M. = Antilog of °" 
C 15 1.1761 

D 75 1.8751 = Antilog of “is 

E 500 2.6990 


F 8 0.9031 = Antilog of 17.6373 


G 45 1.6532 = 58.03 
H 250 2.3979 

| 40 1.6021 

J 36 1.5563 


N = 10 2 log x = 17.6373 


Illustration 2. 
From the following data, calculate the Geometric Mean : 
0.25, 0.0337, 0.2893, 0.00053, 0.5289, 2.4358. 


Solution : 

S. No. Item log x 

(x) G.M. = Antilog of 
10.25 .3979 

2 0.0337 .5276 = Antilog of “*— 
30.2893 .4614 

40.00053 .7243 = Antilog i" 


50.5289 .7233 
6 2.4358 .6393 = Antilog of ( .2456) 
N=6_ .4738 = 0.1760 
So G.M. will be 0.176 
(b) In Descrete Series 

To find geometric mean in descrete series, we find the logarithm of 
each item like the individual series and multiply it by corresponding 
frequency. We add the product of logarithm and frequency ( 2 f .log 
x ) and calculate geometric mean through the following formula : 

G.M. = Antilog of 3" 
Illustration 3. 

From the following data, calculate Geometric Mean : 

Item 67 89 10 11 12 
Frequency 46913864 

Solution: 
Item Frequency log x f x log x 
x f GM. = Antilog of | 3" 
6 4 0.7782 3.1128 
7 6 0.8451 5.0706 = Antilog of “0 
8 9 0.9031 8.1279 
9 13 0.9542 12.4046 = Antilog of 0.9456 
10 8 1.0000 8.0000 = 8.822 
11 6 1.0414 6.2484 So geometric mean will be 8.822 
12 4 1.0792 4.3168 


2 f = 50 47.2811 


(c) In Continuous Series 

For calculation of geometric mean in continuous series first mid- 
values of the series are determined. Thus shape of the series 
becomes like descrete series. Now the geometric mean here is 
calculated in the same way as in the case of descrete series. 


Illustration 4. 
Find Geometric Mean from the following data : 


Class 0-10 10—20 20-30 30—40 40—50 


Frequency 46 1484 
Solution : 


Class Mid value Frequency (log x ) ( f .log x ) 


(x)(F) 

0-10 5 4 0.6990 2.7960 
10-20 15 6 1.1761 7.0566 
20-30 25 14 1.3979 19.5706 
30-40 35 8 1.5441 12.3528 
40-50 45 4 1.6532 6.6128 


N = f= 36 5( flog x ) = 48.3888 
G.M. = Antilog of | 3" | 
= Antilog of ©: 

= Antilog of 1.3441 

= 22.09 


So geometric mean will be 22.09. 


Weighted Geometric Mean 


If the relative importance of different values are different then 
different weights may be given to items for geometric mean like 
arithmetic mean. Calculation method of weighted geometric mean is 
as follows : 

(i) Find separated logarithm for each item-value. 


(ii) Find product of each logarithm and corresponding weight ( W 
log x ). 

(iii) Add the product ( 2 W .log x ) and use the following formula 
to calculate weighted geometric mean : 

W.G.M. = Antilog of >" | 

Here, W.G.M. = Weighted geometric mean 


2 W = Total of all weights 

Weighted geometric mean is equivalent to geometric mean in 
descrete series with respect to calculation. Here weight ( W ) is used 
in the place of frequency ( f ). 
Illustration 5. 

From the following data, find weighted geometric mean : 


Group Index No. Weight 
Food 125 7 

Clothes 133 5 

Fuel and Light 141 4 
Rent 173 1 

Misc. 182 3 


Solution: 
Calculation of Weighted Geometric Mean 


Class Index No. Weight log x W .log x 
xW 

Food 125 7 2.0969 14.6782 

Clothes 133 5 2.1239 10.6195 

Fuel and Light 141 4 2.1492 8.5968 
Rent 173 1 2.2380 2.2380 

Misc. 182 3 2.2601 6.7803 


Total > W=202 ( W .log x ) = 42.9129 

W.G.M. = Antilog of >" | 

= Antilog of ©» 

= Antilog of 2.14564 

= 139.8 

So weighted geometric mean will be 139.8. 
Other Use of Geometric Mean 

The geometric mean is used to find the average of percentage 
rate of increment and ratios. For it the following formula based on 
geometric mean is used : 


P, 


(i) Pp=Po(tr) (r= —1 


where, P py = Value of variate at the end of certain period 


Le 


P 9 = Value of variate at the beginning of the period 


n = Number of periods 
r = Rate of change per unit 


Illustration 6. 
The population of India in 1961 was 43.9 crores and in 1971 it rose to 54.7 
crores. Find out the percentage compound rate of growth per year. 


Solution : 
Formula for rate: r= "V7 —1 
Substituting the values in the formula, 


(54.7 
f= 39 — 1 


log 34.7 —log 43.9 


{ | 

=AL.t © J —1 

0.0955 | 

= Als 4A Amt a4 

= A.L. (0.00955) — 1 

= 1.022 — 1 = 0.022 per unit 

= 0.022 x = 2.2 per cent. 

So rate of growth per year on the basis of compound 2.2%. 
Illustration 7. 

A machine is assumed to depreciate 30% in value in the first year, 20% in the 
second year, and 10% per annum for the next three years, each percentage being 
calculated on the diminishing value. What is the average percentage depreciation 
for the five years ? 


Solution : 

Year Depreciation price 

(X) (log X )— G.M. = Antilog of | 
1 100-30 = 70 1.8451 

2 100-20 = 60 1.9031 = Antilog of = 

3 100-10 = 90 1.9542 

4 100-10 = 90 1.9542 = Antilog of 1.92216 


Zilog * 
n 


5 100-10 = 90 1.9542 = 83.60 
n=52 log X = 9.6108 
Hence the rate of average percentage 


depreciation = 100 — 83.6 = 16.4%. 
Merits, Demerits and Uses of Geometric Mean 

Merits : (1) It is rigid.ly defined. 

(2) It is based on all the observations. 

(3) It is suitable for algebraic treatment. 

(4) It is the most suitable for ratio. 

(5) It is not much affected by fluctuation of sampling. 

Demerits : 

(1) It is not easy to understand. 

(2) Its calculation is comparatively difficult. 

(3) It cannot be calculated if a value is negative. 

(4) It does not give equal importance to all items. 

(5) It cannot be obtained only be inspection. 

Uses : 

(1) If it is to give less importance to large items and more to small 
ones, its use is suitable. 

(2) The geometric mean is the most suitable to find avarage of 
series having ratios and compound rate of increment. 

(3) This average is suitable if there is more skewness in frequency. 

(4) Geometric mean is suitable in the construction of Index 
Numbers. 

(5) We use geometric mean in population growth and percentage 
change in the price. 


Illustration 8. 
Find out Geometric Mean of 6, 12, 24. 


Solution : 
G.M. = (6 x 12 x 24) WS =6x6x2x6x4) 1 
=(63x2x4)M=63x29) RB 26x2=12 


Illustration 9. 
Find out Geometric Mean of 5, 10, 15, 20, 25, 30 and 35. 


Solution : 


X log X 
5 0.69897 If G.M. is geometric mean then 
10 1.00000 


15 1.17609 log GM =: 2 log X 
20 1.30103 


25 1.39794 =} x 8.59522 = 1.22789 
30 1.47712 


35 1.54407 -. GM = Antilog (1.22789) = 16.9 
Total 8.59522 


Illustration 10. 
Find out the geometric mean of the following frequency distribution : 


Marks 11 12 13 14 15 
No. of Students 37852 
Solution : 
(X)(f)log X flog X 

11 3 1.0414 3.1242 

12 7 1.0792 3.1242 

13 8 1.1139 8.9112 


14 5 1.1461 5.7305 
15 2 1.1761 2.3522 


Total N = 252 flog X = 27.6725 
If G.M. is the geometric mean, then 


log GM = x flog X 

= 5 x 27.6725 = 1.1069 
-. GM = Antilog (1.1069) 
= 12.79 


Illustration 11. 
Find out geometric mean of the following distribution : 


Marks 0-10 10—20 20-30 30-40 


No. of Students 5 8 3 4 
Solution : 


Class Mid value ( X )(f) log X flog X 
0-10 5 5 1.6990 3.4950 

10-20 15 8 1.1761 9.4088 

20-30 25 3 1.3979 4.1937 

30-40 35 4 1.5441 6.1764 


Total N = 20 2 flog X = 23.2739 


If G.M. is the geometric mean, then 
log GM = xX flog X 


= “2 = 1.1637 
-. GM = Antilog (1.637) 
= 14.58 


Illustration 12. 

The price of a commodity goes up by 10% in 1995, goes down by 10% in 1996 
and again goes up by 10% in 1997. Calculate average rate of price rise in the 
three years. 


Solution : 
Year Price ( X ) log X 
1995 100 + 10 = 110 2.0414 


1996 100 — 10 = 90 1.9542 
1997 100 + 10 = 110 2.0414 


Total — 6.0370 
Now log G.M. =>" = 
= 2.0123 
« G.M. = Antilog (2.0123) = 102.9 
.. Average rate of increase = 102.9 — 100 = 2.9 


6.0370 
3 


Illustration 13. 
A piece of property was purchased for ~ 2,00,000 and sold 10 years later for * 


23,26,000. What is the average annual rate of return on the original investment ? 


Solution : 
Let average annual rate by xX, then 


2,00,000 x 19 = 3,26,000 

X10 = Sino = 1.63 

= 10 log X = log 1.63 

= 0.2122 

log X=» =0.02122 

“ X = Antilog (0.02122) 

= 105 

Average annual rate of return = 105 — 100 = 5% 

HARMONIC MEAN 

The value, which is obtained by dividing the sum of reciprocal of 
items ©) by number of items ( N ) in a series, is called harmonic 
mean. Therefore, Harmonic Mean is the reciprocal of arithmetic 
mean of the reciprocal of observations. Reciprocal of a number 
means the value obtained by dividing 1 by related number. For 
example reciprocal of 4 is 1/4 or 0.25, reciprocal of 6 is 1/6 or 0.167. 
Reciprocal table may be used to find the reciprocal of an item. It is 
seen almost like logarithm table the difference is only that at the time 
of finding logarithm mean difference is added and here it is 
subtracted. 
Computation Method of Harmonic Mean 
(a) Individual Series 

(i) Reciprocal of all items ‘) are determined by general calculation 
or by table. 

(ii) Sum of reciprocals ©) is obtained and harmonic mean is 
calculated by using the following formula : 

H.M. = Reciprocal of | 


Illustration 14. 
Find Harmonic Mean from the following data : 
6, 13.4, 18, 24, 3.83, 152, 0.034, 0.258, 35.3, 5.48. 
Solution : 
Item Reciprocal 
X (1/ X ) 
6 0.16667 Harmonic mean (H.M.) 
13.4 0.07462 
18 0.05556 = Reciprocal of 
24 0.04167 
3.83 0.06109 = Reciprocal of ~~ 
152 0.00658 
0.034 29.41176 = Reciprocal of 3.410473 
0.258 3.87597 = 0.29321 
35.3 0.02833 So Harmonic Mean is 0.29321. 
5.48 0.18248 
N=102 * = 34.10473 


(b) Computation Method in Descrete Series : 

(i) Reciprocal of item value is determined by general calculation 
method (dividing 1 by item values) or by reciprocal table. 

(ii) We multiply reciprocal of each value by corresponding 
frequency. 

(iii) We find the total of products &/ and calculate harmonic 
mean with the help of the following formula : 


H.M. = Reciprocal 


> Reciprocals | 
N 


Here symbol N can be used in the place of 2 f. 


Illustration 15. 
Calculate Harmonic Mean from the following data : 


Item 5 10 15 20 25 
Frequency 261273 


Solution : 


Item Frequency Reciprocal 
(X)(F)& rH) 

5 2 0.2 0.4 

1060.1 0.6 

15 12 0.667 0.7999 

20 7 0.05 0.35 

25 3 0.04 0.12 


2 f=302 (fx Rec. X ) = 2.26999 


S(f-Rec.X) 


H.M = Reciprocal of Ps] 

= Reciprocal of ©. 

= Reciprocal of 0.07567 

= 13.22 

So Harmonic Mean will be 13.22. 


(c) Computation Method of Harmonic Mean in Continuous 
Series : First we find mid value of item groups of the series to find 
harmonic mean is series. Thus series converted into descrete series. 
Now assuming mid value as item ( X ), we calculate harmonic mean 
in the same way as we study in previous method. 

Weighted Harmonic Mean : If the relative importance are 
different for the item values in the series, the weighted harmonic 
mean is calculated. Here a weight ( W ) is given for each item value 
according to its importance and weighted harmonic mean is 
calculated according to computation method of harmonic mean in 
descrete series the difference is only that here weight ( W ) is used 
in the place of frequency of descrete series. So the formula for 
calculation is as follows : 

W.H.M. = Reciprocal of 


Illustration 16. 
Calculated weighted harmonic mean from the following data : 


Size 3579 


Weight 2 4 3 1 
Solution : 
Item Weight Reciprocal 
xX W &) Ww) WH.M. = Rec. of meet 


3 2 0.3333 0.66667 

540.2 0.8 =Rec. of 1 

7 3 1.14286 0.42857 = Rec. of 0.200635 
910.1111 0.11111 = 4.9842 


> W=102 \*) = 2.00635 So harmonic mean is 4.9842. 


Illustration 17. 

Ishan covers his first 3 km. at an average speed of 8 km. per hour, another 2 
km. at 3 km. p.h. and last 2 km. at 2 km. p.h. Find his average speed for the entire 
journey. 


Solution : 


Speed (km/hours) Distance (km.) Reciprocal 


Xf (1X) (fx Rec. X) 
8 30.125 0.375 

3 2 0.333 0.667 

220.5 1.000 


2f=72(f.Rec. X)=2.042 
H.M. = Rec. of |: | 
= Rec. of 7 
= 3.428 km. per hour 
So average speed of journey is 3.428 km. per hour. 


Illustration 18. 

Gourav travels Khandwa to Indore at an average speed of 60 km. p.h. and 
returns along the same route at an average speed of 90 km. p.h. Find the average 
speed for the entire trip. 


Solution : 


Speed (km/hour) Reciprocal 
( X ) H.M. = Reciprocal of “~ 
60 0.016667 

90 0.011111 = Reciprocal of “=~ 


N = 2 0.027778 = 72 km. per hour 
So average speed of entrie journey is 72 km. per hour. 


Illustration 19. 

Rikki takes trip which entails 600 km. by train at 90 km. p.h., 200 km. by boat at 
45 km. p.h., 500 km. by Aeroplane at 525 km. p.h. and 130 km. by taxi at 60 km. 
p.h. What is the average speed ? 


Solution : 

Speed Distance Travelled Reciprocal 
(Km/hour) X f (&) (0) 

90 600 0.11111 6.66667 

45 200 0.22222 4.44444 

525 500 0.00190 0.95238 

60 130 0.01667 2.16667 


> f= 1430 “W = 14.23016 
H.M. = Rec. of) s) 


14.23016 


= Rec. of “1:0 

= Rec. of 0.0099512 

= 100.49 

So average speed of Rikki in whole journey is 100.49 km. per 
hour. 


Merits, Demerits and Uses of Harmonic Mean 
Merits : 
(1) It is based on all the items. 
(2) It is capable of algebraic treatment. 
(3) It is suitable average in time, rate, velocity. 
(4) It is least affected by fluctuation of sampling. 
Demerits : 
(1) It is difficult to understand. 


(2) Its computation is also difficult. 
(3) It gives more importance to small values. 


Uses : Harmonic mean is used to find the average of speed, rate 
and consumption. 


Relation between Mean, Median and Mode 

(a) Arithmetic mean the largest, geometric mean is less than that 
and harmonic mean is smallest among all these three types of 
averages : 


> G.M. > H.M. 
(b) But if the values of all the items are equal then arithmetic 
mean, geometric mean and harmonic mean all three are equal to 
each other, /.e., 


= G.M. = H.M. 
(c) The geometric mean of any two items is equal to the geometric 
mean of arithmetic mean and harmonic. 


G.M. = 


or G.M. 2 = 
Selection of Suitable Average 

Which average is the best average among the different types of 
averages ? To select it is a very difficult work. The reason behind it is 
that each average has its merit and demerit. However, selection of 
averages should be done in the following way by keeping in mind the 
purpose and frequency distribution : 

(1) Generally all qualities of an ideal average are found in 
arithmetic average. It can be used in all the places except some 
situations. It is suitable to study social, economic problems. 

(2) Geometric mean is used for population growth, economic rate 
of increase, compound rate of interest and construction of Index 
Numbers. 

(3) Harmonic mean is used to know the average speed. 

(4) Median is used for qualitative interpretation such as 
intelligence level, honestly, qualification etc. 


(5) Mode is mostly used for the data related to business, industrial 
and tradition fashion. 


Illustration 20. 
Find the Harmonic Mean of 32, 35, 36, 58, 61, 73. 


Solution : 


X Reciprocal « 
32 0.03125 


35 0.02857 H.M. = Reciprocal of “v 
36 0.02778 


58 0.01724 = Reciprocal of |= | 
61 0.01639 


73 0.01370 = 44.47 
Total 0.13493 


Illustration 21. 
Find out Harmonic Mean of the following distribution : 


Marks 11 12131415 
No. of Students 37852 


Solution : 


1 


Marks Frequency « « 

X f 

11 3 0.0909 0.2727 

12 7 0.0833 0.5831 

13 8 0.0769 0.6152 

14 5 0.0714 0.3570 

15 2 0.0667 0.1334 

Total N =252 : = 1.9614 
H.M. = Reciprocal of - 
= Reciprocal of |=) = iu 
= 12.75 

Illustration 22. 


Find out Harmonic Mean of the following distribution : 
Class Interval 2—4 4—6 6-8 8-10 
Frequency 20 40 30 10 

Solution : 


1 


. if 
Class Mid value requency « x 


Interval ( X )( f) 

2—4 3 20 0.3333 6.666 
4—6 5 40 0.2000 8.000 
6-8 7 30 0.1428 4.284 
8-10 9 10 0.1111 1.111 


Total N = 100 +) = 20.061 

H.M. = Reciprocal of - 

= Reciprocal of ™ | = 9.98 

Exercise 7 (D) 

Geometric Mean 
1. Find geometric mean from the monthly income of ten families : 
Income (in ~ ) : 400, 1300, 1250, 450, 150, 1200, 1178, 220, 350, 320. 
[ Ans. : G.M. = 522.9] 
2. Following are the monthly income of 10 families of a certain place. Find 
Geometric Mean : 

Families ABCDEFGHIJ 

Income (in © ) 850 700 150 750 5,000 80 450 2500 400 360 
[ Ans.: G.M. = 580.39] 


3. Calculate geometric mean from the following figures : 
1238, 178.7, 89.9, 78.4, 9.7, 0.874, 0.989, 0.012, 0.008, 0.0009. 


[ Ans.: G.M. = 2.019] 


4. Calculate the geometric mean of the data give below : 


Class of people No. of families Income per people (in °) 
Landlord 1 1000 


Cultivators 50 80 
Landless labourer 25 40 
Money-landers 2 750 
Teachers 3 100 
Shopkeepers 4 150 
Carpenters 3 130 
Weavers 5 60 


[ Ans.: G.M. =~° 73.91] 
5. Calculate Geometric mean from the following data : 
Size of Item 10 11 12 13 14 15 16 
Frequency 2453321 
[ Ans. : GM = 12.44] (Indore 2006) 
6. Geometric mean of the following incomplete distribution is 15.3. Find the 
missing frequency : 
X 82517 30 
£534? 
[ Ans. : f = 2] 
7. The following are the population figure of a city : 
Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 


Population (in lakhs) 109.3 129.7 134.6 157.8 172.3 190.6 210.5 230.9 260.4 
Find out the growth rate over the period. 


[ Ans. : G.M. = 1.115, i.e., Growth rate is 11.5%.] 
8. Calculate weighted geometric mean of the following Index Numbers of 
wholesale prices of different 


articles in India. 
Articles Price Relatives Weight 
1. Food articles 388.5 31 
2. Industrial raw materials 501.9 18 
3. Semi-manufacturers 402.3 17 
4. Raw-manufactures 384.6 30 
5. Miscellaneous 559.3 4 


[ Ans. : G.M. = 144] 

9. From the following data, find geometric mean of 100 wrestellers : 
Weight (in kg) 100-110 110-120 120-130 130-140 140—150 150-160 
No. of wrestellers 14 16 30 20 155 

[ Ans. : G.M. = 126.739] 

10. Find arithmetic mean and geometric mean from the following : 

X 10-20 20-30 30-40 40-50 50-60 60-70 
Frequency 1015201685 

[ Ans.: = 36.62, G.M. = 33.7] 

11. The following table gives marks obtained by 70 students in mathematics. 
Calculate the Simple Mean and Geometric Mean of the series : 

Marks more than 70 60 50 40 30 20 
No. of students 7 18 40 40 63 70 


[ Ans.: G.M .= 46.39 marks, =49 marks] 
Harmonic Mean 


12. Calculate harmonic mean in the following series : 
Age (in Years) 123456789 
No. of Children 7 11 16 17 26 31 11 1 1 
[ Ans. : 3.53 years] 
13. Calculate the weighted average mean from the data given below : 
Marks 11 12 131415 
Weight 37852 
[ Ans. : H.M. = 12.74 marks] 
14. Calculate harmonic mean from the following series : 
Marks 0-10 10—20 20-30 30-40 40-50 
No. of Students 451065 
[ Ans. : 16.52] 


Geometric and Harmonic Mean 


15. From the following data, find geometric mean and harmonic mean : 
10, 17, 29, 95, 95, 100, 100, 175, 250, 750 


[ Ans. : G.M. = 82.49, H.M. = 40.75] 

16. Calculate Geometric and Harmonic mean from the following : 
S.No. 1234567 
Size of item 2574 362 16 4 5 0.0075 0.0008634 

[ Ans. : G.M. = 2.947, H.M. = 0.0054] (Bilaspur 2005 ) 

17. The monthly income of 10 families in a certain locality are given below. 
Calculate the arithmetic mean, the geometric mean and the harmonic mean of 
incomes : 

Family ABCDEFGHIJ 
Income 85 70 10 75 500 8 42 250 40 36 
[ Ans.: =~ 111.6, G.M. = ~° 55.35, H.M. = * 28.78] 


18. Calculate geometric mean of the following : 
2574, 475, 5, 0.8, 0.08, 0.005, 0.0009, 75 


[ Ans. : G.M. = 1.841] 


19. From the data given below, calculate geometric mean and harmonic mean : 
15, 250, 15.7, 157, 1.57, 105.7, 10.5, 1.06, 25.7, 0.257 


[ Ans. : G.M. = 12.75, H.M. = 1.737] 

20. Calculate the Arithmetic mean ( _—+), Geometric mean (G.M.) and Harmonic 
mean (H.M.) of 32, 35, 36, 37, 39, 41, 43, 47 and 48. Show that > G.M. > 
H.M. 

[Ans.: 39.8, G.M. = 39.45, H.M. = 39.3] 


QUARTILES 


Definition : The three points on the scale of data which divide the 
data four equal parts so that frequencies remain equal in each part 


are called Quartiles for the data arranged in ascending or 
descending order. 


If the data are in ascending order then first point is called first 
quartile and is denoted by Q 1 . The second point is called second 


quartile and is denoted by Q 9 . The third point is called third 
quartile and it is denoted byQ 3. 

Formula of calculation of Quartiles : If N is total frequency and 
data are in ascending order then 

Q4= ° thitem 


Q»5= ~ thitem for individual and discrete series 
Q3= « thitem 
Q 4=% thitem 


Q 5 = * thitem = = th item for continuous series 
Q3= « thitem 
Q4,Q 9, and Q 3 , are obtained directly by definition in 


individual and discrete series but quartiles class are determined in 


continuous series. Then quartiles are calculated by a formula 
Characteristics of Quartiles : 


(1) First quartile or lower quartile divides the total frequency in the 
ratio 1 : 3. The first quartile is that value of a variable for which less 
than type cumulative frequency is equal to 25% i.e. , =. 

(2) Second quartile (which is equal to median) divides the total 
frequency in the ratio of 
1 : 1. The second quartile is that value of a variate for which less 
than type cumulative frequency is equal to 50% i.e. , >. 


(3) The third quartile or lower quartile divides the total frequency in 
the ratio 3: 1. The third quartile is that value of a variable for which 
less than type cumulative frequency equal to 75% i.e. ,:. 

(4) The central 50% of the item exist between Q 4 and Q 3 
(assume data are in ascending order). 

Calculation of Quartiles in Individual Series 


Illustration 1. 
Calculate quartiles from the following data : 


Marks obtained : 07 32 39 20 15 45 35 12 08 22 25 02 26 28 42 


Solution : 

Arranging the data in ascending order : 
S.No.:12345678910 11 12 13 14 15 

Marks : 02 07 08 12 15 20 22 25 26 28 32 35 39 42 45 
Here number of items N = 15 

Q4= % thitem =~ thitem = 8th item = 12 


Q5= +! thitem = ~~ thitem = 8thitem = 25 
Q3=3 °°! thitem =“ th item = 12th item = 35 


Illustration 2. 

A manager while inspecting his departments found the number of absentees in 
10 departments as follow : 21, 12, 15, 17, 18, 18, 19, 0, 6 

Calculate first and third quartile. 

Solution : 

Arranging the number of absentness in ascending order, 

0,6, 12, 15, 17, 18, 18, 20, 19, 20, 21 Here N = 10 

Q4= > thitem =~" th item =2.75th item 

or Q 4 = 2nd item + : (3rd item & 2nd item) 
=6+:(12-6)=6+ + =6+45=10.5 


3(N +1) 


Q3= « thitem =3 x 2.75th item = 8.25th item 


or Q 3 = 8thitem + 4 (9th item — 8th item ) 

=19+1(20-19)=19+ 1 =19+0.25 = 19.25 

CALCULATION OF QUARTILES IN DISCRETE 
SERIES 


Illustration 3. 
Calculate quartiles from the following data : 


x: 247916 18 24 
f: 79252218118 


Solution : 
Xf cf. 
278 
4916 
7 25 41 
9 22 63 
16 18 81 
18 11 92 
24 8 100 


Total UU 

Q4,= : thitem 

= “~~ th item 

or Q 4 = 25.25th item 

= 25 th item + 0.25 (26th item — 25th item) 
=7+0.25(7 -7) =7+0.25(0)=7+0=7 


aN +1) 


Q5= 7 thitem = 50.50th item 


or Q 59 = 50th item + 0.5(51th item — 50th item) 
=9+0.5(9-9)=9+0.5(0)=9+0=9 

Q3= 7 thitem 

= 75.75th item 


or Q 3 = 75th item + 0.75(76th item -75th item) 


= 16 + 0.75(16 — 16) 
= 16 + 0.75(0)=16+0=16 
Illustration 4. 
Locate median and lowes quartile from the follwoing data : 


Size of Shoe: 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 
No. of pairs : 20 36 44 50 80 30 20 16 14 
Solution : 


Variable ( x ) Frequency (f ) c.f 
4.0 20 20 

4.5 36 56 

5.0 44 100 

5.5 50 150 

6.0 80 230 

6.5 30 260 

7.0 20 280 

7.5 16 296 

8.0 14 310 


Total 310 U 
lower quartile number g 4 = 9: = = ¢ =77.75 


So lower quartile Q 4 = 77./5th item 


= 77th item + 0.75(78th item — 77th item) 
= 5.0 + 0.75 (5.0 — 5.0) = 5.0 + 0.75 x0 =5.0+0=5.0 


N+1 310 +1 311 


Median number ] go = = = 2 = 2 = 155.50 


155th item + 156th item 


Median, M = 2 
=a = 6,0 
CALCULATION OF QUARTILES IN GROUPED 


SERIES 


To calculate quartiles in grouped series first of all find = th ] = th 
and « th item. Quartiles class are determined on the basis of them. 
them 


AN _¢ 


Formula:Q ,=L4+ 7 (L9-L 4 )isused 
where k = 1, 2,3 
L5\UL 4 =k th quartile class 


N = Total frequency 
f = Frequency of kth quartile class 
c = Cumulative frequency of the class preceding the kthe quartile 


class 
Clearly 


Q,p=lyt oo ei 

Qog=Ly sr x / Here the values of L 4 , f , / are taken 
according to the 

respective class 

Q3=L4+7 xi 

Note : We can write q 4 in the place of * ] q 2 in the place of * 
and q 3 in the place of * 
Illustration 5. 

Calculate quartiles from the following data : 

Class-interval : 0-10 10—20 20-30 30-40 40-50 

Frequency : 38 20 127 
Solution : 


Class-interval Frequency c.f. 
0-1033 

10-20 8 11 

20-30 20 31 

30-40 12 43 

40-50 7 50 


Total 5 U 
G4S Ss 125 


Q 4 = % thitem = q 4 thitem = 12.5th item 
12.5th item is in the class 20-30. 
So, L 1 =20,L 9 = 30, f= 20, c= 11, /=10 


From the formulaQ 4=L4+ 7 xi 
= 20 + =e * x 10 


10 


Q4=20+ » =20+ » =20+0.75 = 20.75....(1) 


Qo= 05 =25 

Q >=: thitem = q 9 thitem = 25th item 
25th item is in the class 20-30. 

From the formulaQ9=L4+ / xi 


25-1 


= 20 + 7 x 10 
orQ9=20+ » =20+ » =20+7=27....(2) 
G52 S"2 S375 


Q 3= « thitem = q 3 thitem = 37.5th item 
37.5th item is in the class 30-40. 
So, L.4 ='30,.L.9 = 40, F212;6=31;7=10 


37S 


From the formula:Q 3=L4+ 7 xi 
= 30+ “7 x10 


= 30+ » =30+5.42 = 35.42 ....(3) 
Illustration 6. 

Calculate three quartiles from the following data : 

Class : 9-11 12-14 15-17 18-20 21-23 

Frequency :4 112098 


Solution : 
Conversion of inclusive classes into exclusive classes 
necessary : 


Class fc.f 
8.5-11.544 


First Quartile Class —14.51115 
Second Quartile Class 0) 14.5—-17.5 20 35 


Third Quartile Class 0 17.5—20.5 9 44 
20.5—23.5 8 52 


Total | f=52U 


Q 4 = thitem = q 4 thitem = 13th item 
13th item is in the class 11.5-14.5 

So, L 4 =11.5,L 9 =14.5,f=1,c=4,7=3 
From the formula: Q. 4=L4+ 7 xi 
Q4=11.5+ a x3 

=11.5+ 1 =11.5+ 2.46 = 13.96 ....(i) 
q4=a = =26 

Q >= : thitem = q 9 thitem = 26th item 
26th item is in the class 14.5—17.5. 

So,L 4 =14.5,L 9 = 17.5, f= 20, c=15,/=3 
From the formulaQ 9=L4+ 7 xi 
=14.5+ » x3 

=14.5+ » =14.5+1.65 = 16.15 ....(ii) 
Qq3=1= 1 =39 

Q3= : thitem =q 3 thitem = 39th item 
39th item is in the class 17.5—20.5. 

Sol 4 =17.5,L 9 = 20.5, f= 9, c= 35, /=3 
From the formula:Q 3=L4+ 7 xils] 


39 


Q3=17.5+ 9 x3 


1 


=17.5+ 9 =17.5+ 5 =17.5+ 1.33 = 18.83 ....(iii) 
MISCELLANEOUS ILLUSTRATION 


Illustration 7. 
Find out quartiles and median from the following series : 


Wages (in ~ ) No. of Persons 


Below 30 69 
30—40 167 
40-50 207 
50-60 55 
60-70 50 
70-80 10 
80-90 10 
90-100 8 


Solution : 
Wage Frequency (f ) c.f. 


Below 30 69 69 
30-40 167 236 
40-50 207 443 
50-60 55 498 
60-70 50 548 
70-80 10 558 
80-90 10 568 
90—100 8 576 


Total N= 576 — 


First Quartile Q 4 =q 4 thitem = 144th item 


144th item is in the class 30-40, So 
From the formulaQ 4=L4+ 7 xi 


144 - 69 


Q 4 = 30+ (40 — 30) 


= 30+ “wr = 30+ 4.49 = 34.49 ....(1) 
=e SP = 2 x 444 = 968 
Median, M =m th item = 288th item 
288th item is in the class 40—50 
L 4 =40, L 9 = 50, f= 207, 
Formula M=L 4+ 7 xic=236,i=50-—40=10 
M=40+ 7 x 10 


52 x 10 


=40+ om =40+ 2.51 = 42.51 ...(2) 

Q3= 4 => 1 =3% 144 = 432 

Third Quartile Q 3 = q 3 thitem = 432th item 
432th item is in the class 40-50, So 


FormulaQ 3=L4+ 7 xi 


432 — 236 
Q 3 =40+ aw x10 
196 x 10 


Illustration 8. 
Find the limits of central 50% items from the data given below : 


Class : 0-33-66- 12 12-20 20-24 
Frequency : 6 21 25 16 12 
Solution : 


Limits of the central 50% items are Q 4 and Q 3 so we have to 
calculate Q 4 andQ 3. 


Class Frequency (f ) c.f 
0-3 6 6 

3-6 21 27 

6-12 25 52 

12-20 16 68 

20-24 12 80 


Total N = 80 — 
Q 1 = 1+ xi a= = 80-29 
oS 6-8) Cc = 6, f = 24 


3+ 


L 4 =3,/=6-3=3 
sie ears) ae 

Q 3 = 1+ xt ay = ax6? = 60 
c=52,f=16 


=ee'o @™ £ 4=12,i=20-12=8 

=12+ «© =12+4=16....(2) 

Thus, the limits of the central 50% items are 5 and 16. 
Illustration 9. 

In a group of 1,000 wage-earners the monthly wages of 4% are below * 60 and 
that of 15% under 
~ 62.50 but over * 60, 15% earned * 95 and over and 5% got ~ 100 and over. 
Find the median wage and the lower and upper quartiles. 
Solution : 

Wages Percentage of Frequency Cumulative frequency 

(in ~ ) Wage Earners (f)( c.f. ) 


Less than 60 4% 40 40 
60-62.50 15% 150 190 


62.50-95 61% (Rest) 610 800 
95-100 15% 150 950 


100 and more 5% 50 1,000 
Total 100% 1,000 — 


m= 1000 _ sop 
Median Class 62.50—95 
Formula M = ** 7 * 


500 — 190 


=62.50+ “so x 32.50 


310 


= 62.50 + so x 32.50 
= 62.50 + 16.50 = 79 


N U = 950 


Q4=3 
First Quartile Class 62.50—95 
Formula Q 4 = °° 7 ™ 

Q 4 =62.50+ “ww” x 32.50 
= 62.50+ “sw = 62.50 + 3.20 = 65.70 
So third Quartile Class 62.50-95 
Formula Q 3 = °* 7 * 

Q 3=62.50+ “ww 63.5 
= 62.50 + “sn 


18,200 


= 62.50 + ‘sw = 62.50 + 29.84 = 92.34 


Illustration 10. 
Find out mode and both quartiles from the following series : 
Marks more than : 70 60 50 40 30 20 
No. of Students : 7 18 40 40 63 65 
Solution : 
Convert the cumulative frequency distribution into an ordinary 
frequency distribution : 
Class Frequency f Cumulative Frequency less than type 
20-30 65 —- 63 =22 
30—40 63 — 40 = 23 25 
40-50 40 —- 40 =0 25 
50-60 40 — 18 = 22 47 
60-70 18-7 = 11 58 
70-80 7 65 


Total 65 — 


By the density method 2 + 23 + 0 = 25 and 0 + 22 + 11 = 33. So 


we can take 50-60 as median class. 
hi-ho 


Mode, Z = “"%-f-A" 

= 50 + xeon x 10 

= 50+ wn 

= 50+ « =50+6.67 = 56.67 vad 
Alternative Method 


Since the frequency of class 30—40 is maximum. Hence the modal 
class witll be 30—40. Then, 


L,+ hi-ho xi 

Mode, Z = * %/-f-A 
23 - 2 

= 30 + 2x23-2-0 x 10 
21x 10 


= 30 + 46-2 
= 30+ « =30+ 4.77 = 34.77 marks 


N 6 


First Quartile, Q 4 = nie Bross 
= 30+ “= x 10 
= 30+ “x” = 30 + 3.20 = 36.20 marks 


3N _, ts . 
cama 3N 7 3x 65 


Third Quartile, Q 3 el f ; rs 85 
= 60 fe ss 47 . 10 


10 


=60 + “i =60 + 1.59 = 61.59 marks 


Illustration 11. 
Calculate mean, median, mode and quartiles from the following data : 


Class : 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 
Frequency (f): 710 13 26 35 20 115 

Solution : 
Convert inclusive class into exclusive class : 

Class Frequency Mid-value “~: fd . c.f. 

(f)(x) 


10.5-15.5 7 12-4-287 
15.5-20.5 10 18-3- 3017 


20.5—25.5 13 23 — 2 — 26 30 
25.5—-30.5 26 28 — 1 — 26 56 
30.5—35.5 35 33 0 0 91 
35.5—40.5 20 38 1 20 111 
40.5—-45.5 11 43 2 22 122 
45.5-50.5 5 48 3 15 127 


Total 127 — — — 53 — 


f 


xfd, 
X=at x! 


Mean, 
= 33 + 
q1= 


(—53)x5 


m = 33—-2.09 = 30.91 ....(1) 


3N _3x127 


G35 47> 4 =95.25 
First Quartile, Q 4 = °"7” 
=255+ % x5 
=25.5+ s =25.5+ 0.34 = 25.84 ....(2) 
Second Quartile, Q 9 = ""7 “ =Median, M 


63.50 — 56 


=30.5+ % x5 

= 30.5+ % =30.5+ 1.07 = 31.57 ....(3) 

Third Quartile, Q 3 = "7 * 

=35.5+ 0» x5 

= 35.5+ » =35.5+ 1.06 = 36.56 ....(4) 
Ai-fo 


Mode, Z = "*%-1-2""™ 

= 30.5 + axas new 0-08 

= 30.5+ % = 30.5 + 1.875 = 32.375 ....(5) 
Exercise7(E) 


Quartile 
1. Calculate upper quartile : 
70, 15, 4, 2, 10, 16, 17, 3, 11, 7 


[ Ans.: Q 3 = 16.25] 


2. Find median and Q 4 from th.e following data : 


Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 
Frequency 25 15 20 15 20 30 65 60 

[ Ans.: Q 4 =31.67, Q 9 = M =60] 

3. Find quartiles from the following table : 

Marks 0-5 5—10 10-15 15-20 20-25 25-30 30-35 

No. of Students 2 7 18 121394 

[ Ans.: Q 4 =12.01, Q 9 = 17.29, Q 3 = 23.75] 


Mode, Median and Quartile 
4. Find the median, mode and third quartiles from the following data : 


Class : 0-10 10—20 20-30 30-40 40-50 50-60 60-70 70-80 

Frequency : 2 18 30 45 35 20 6 3 

[ Ans. : M = 36.56 Marks, Z = 36 Marks, Q 3 = 49.93 Marks] (Vikram 2004) 
5. Calculate mode, median and both quartiles from the following table : 

Marks 1-5 6—10 11-15 16—20 21-25 26-30 31-35 36—40 

No. of Students 3 10 20 30 20953 

[ Ans.: Z= 18 , M= 18.33, Q 4 =13.5and Q 3 = 23.5] 


6. Find the values of mode, median and both the quartiles from the following 
frequency distribution : 


Class Frequency Class Frequency Class Frequency 
0-2 1 11-12 5 22-24 8 

2-3 2 12-15 18 24-29 11 

3-6 1 15-18 7 29-30 9 

6-8 4 18-20 12 30-34 6 

8-11 3 20-22 10 34-36 3 


[ Ans. : Z = 20, M=19.8,Q 4 = 14.16, Q 3 = 25.2, Class CUK;Sa 0-6, 6— 


12 vkfn ] 


Mean, Median and Quartile 
7. Calculate median, mean and both quartiles from the following data : 


Marks (below) 10 20 30 40 50 60 70 80 90 
No. of Students 20 44 76 104 126 144 182 192 200 
[ Ans. : * = 40.6 Marks, M = 38.57 Marks, Q 4 = 21.875 Marks, Q 3 = 61.58 


Marks] (Sager 2004) 
8. Calculate mean, median and third quartile from the following data : 


Central Point : 10 12 14 16 18 20 


Frequency : 371218105 


Mean, Median, Mode and Quartile 
9. Calculate mean, median, mode and first quartiles from the following data : 


Age more than 0 10 20 30 40 50 60 70 80 90 100 
No. of persons 80 77 72 65 55 43 28 16 1080 
[ Ans.: * = 51.75, M=Q 5 =52, Z =55, Q 4 =35] 


Important Formula 


Statistical average Individual series Descrete _ series 
Continuous series 
1. Arithmetic mean 


Direct method => =x =% 
Shot-cut method =A+. =At sy =Atay 
Step deviation method =A+ x xi =A+ ©“j =A+ x 


i 
Common difference = / 


+1 sie! 


2. Median m = size of = th m =sizeof = thm =size of = th 
item item and 


Meky = (mae) 

When the series is given 

in descending order 

M=L 5-7 (m-c) 

3. Mode The item occurring Grouping (i) Calculation of modal 
most frequently method class by grouping or 


inspection method 
Z=3M-2 (ii)Z=L4+ 


mmnek (L9—L 4) 
4. Geometric mean 
GM = AL 8") GM = AL |" S77] GM = AL | 


5. Harmonic mean HM = Rec. H.M. = Rec. H.M. = Rec. 


{> Rec.X \ ) SS] [Aue 
. oN af 


THEORETICAL QUESTIONS 


Long Answer Questions 


1. What is meant by control tendency ? Describe various methods to measure it. 


2h 


2. Give the definition of mean and describe its merits. 

3. Write the definitions of Arithmetic mean, Geometric mean and Harmonic mean 
and compare on the basis of their merits and demerits. 

4. Explain the relative merits and demerits of the control measures Mean, Median 
and Mode. Which is the most suitable among them ? Give its reason. 

5. Give the definitions of various measures of central tendency. Which objects do 


these measures satisfy ? 
Short Answer Questions 


1. What are characteristics of an ideal average ? How far geometric mean 
satisfies them ? 

2. Write short notes on geometric mean and harmonic mean. 

3. Describe the advantages and disadvantages of median and mode. 


4. What is average ? In which circumstances you will use the following averages : 
(a) Arithmetic mean, 

(b) Median, 

(c) Mode. 


5. What is meant by central tendency ? 
6. Describe the various types of averages. 


7. Describe the merits and demerits of Arithmetic mean. 


8. Under what conditions you will use the following : 

(i) Median, (ii) Mode ? (Ravisankar 2005 ) 

9. Write the formula to show the relationship among mean, median and mode. 
(Rewa 2009 ) 

10. What do you understand by ‘statistical average ? (Rewa 2009 ) 


OBJECTIVE QUESTIONS 
State whether following statements are true or false : 
1. Least value of item will be mode. 
2. Mode is affected by extreme values. 
3. Geometric mean of 8 and 2 is 5. 
4. If any number is negative then geometric mean is not possible. 
5. With the help of median and arithmetic mean find out the modes formula is Z 
=3 M-2 ~*. 
6. Algebraic sum of deviations from mean to be zero. (Rewa 2009 ) 
7. Average speed can be calculated by Harmonic mean. (Rewa 2009 ) 


[ Ans. 1. True, 2. False, 3. False, 4. True, 5. True, 6. True, 7. True A ] 
Choose the correct answers 


1. The arithmetic mean of 9, 18, 27, 36, 45 is: 

(a) 18 (b) 27 (c) 36 (d) 45 

2. Which is true : 

(a) Mean = 3 Median — 2 Mode (b) Median = 3 Mode — 2 Mean 

(c) Mode = 3 Median — 2 Mean (d) Mode = Mean + Median 

3. If the median of 3, 4, x , 8 is 5 then the value of x will be: 

(a) 3 (b) 4 (c) 5 (d) 6 

4. Geometric mean of 4, 8, 16, 32, 64 is: 

(a) 8 (b) 16 (c) 32 (d) 64 

5. The arithmetic mean of a distribution is 5. If each frequency is multiplied by 3 


then arithmetic mean will be : 


(a) 5 (b) 15 (c) 5/3 (d) 3/5 

6. If the mean and geometric mean of two numbers are 5 and 4 respectively then 
these number will 
be: 

(a) 6, 4 (b) 7, 3 (c) 2, 8 (d) 5, 5 

7. If mode is 10 and A.M. is 13 then the median will be : 

(a) 14 (b) 10 (c) 12 (d) 15 

8. Harmonic mean of 2 and 4 is: 

(a) 3 (b) 2.9 (c) 2.67 (d) 2.5 

9. The average size of shoes is 8 then here average mean is : 


(a) Arithmetic mean (b) Median 

(c) Mode (d) Geometric mean 

10. Correct formula of Arithmetic mean is : (Bhopal 2009 ) 
Kax+ he X-N4 

(a) Nv (b) 

[ Ans. 1. (b), 2. (c), 3. (d), 4. (b), 5. (b). 6. (Cc), 7. (Cc), 8. (Cc), 9. (c), 10. (a).] 


Xfdx 
N 


Measures of Central Tendency | 


8 
MEASURES OF DISPERSION 


It is clear from the study of different averages in the previous 
chapter that statistical averages are the suitable means to find the 
central tendency of any series, that is, they represent the series but 
they could not give the information of all characteristics of the series. 
For example, we can’t know from it the extent to which the series 
Scatter, that is, how much the average difference is among the 
different values from the central tendency in the series. So 
dispersion is needed to know the spread or Scatter in a series. This 
fact can be easily understood by the following example : 

Series ‘A’ Series ‘B’ Series ‘C’ 
5,000 4,900 1,000 

5,000 5,100 400 

5,000 5,500 9,000 

5,000 4,800 8,000 

5,000 4,700 6,600 


Total 25,000 25,000 25,000 
Average 5,000 5,000 5,000 


From the study of above example it is clear that the averages of all 
the three series are the same (5,000). So all the three series seem 
alike but it is not so in reality. Actually the mean of series A 
represents the whole series properly because the difference of its 
items from mean is zero. The smallest item of the series is 4,700 and 
the largest item is 5,500. So the deviation of the smallest item from 
mean is 4,700 — 5,000 = — 300 and from largest item is 5,500 — 
5,000 = 
+ 500. Thus there is scatteredness of item’s values from mean but it 
is very small in the comparison of item’s values. So the mean of this 
series is also representing the series properly. But it is not so in the 


series ‘C’. Here the smallest value is 400 and the largest value is 
9,000. i.e. , the deviation of the mean from the smallest value is 400 
— 5,000 = — 4,600 and from the largest value is 9,000 — 5,000 = 
4,000, i.e. , the smallest item of the series is less than (1/10) of the 
mean and the largest item is approximately equal to the double of 
the mean. So here it can be said the mean does not represent the 
series properly. So there is maximum scatteredness in series ‘C’ out 
of three series. Hence we can't obtain the knowledge of all 
characteristics of the series by the study of only mean. Due to this, 
limitation of an average, there is an interesting classical proverb in 
statistics, “Lekha Jokha Jyon Ka Tyon Sara Kunba Dooba Kyon.” 
There is a short story about this proverb, once upon a time a man 
was crossing a river with his family on foot. That man had simple 
knowledge of statistics. Before entering into river, he measured the 
depth of river at various places and calculate their average. He found 
that the average height of his family members is more than the 
average depth of the river. He thought that his family will cross the 
river and entered into the river. But at a place in the river, there was 
more depth than the average depth of the river. His all family 
members were drowned. If he did measure the maximum depth of 
river and average deviation of depths and minimum height of the 
family members and average deviation of the height then perhaps 


his family were saved. 

So it is clear from this that the knowledge of only mean is not 
sufficient for the full Knowledge of the series but the study of scatter 
or spread of the series is also essential. 

This Scatter or spread of the series or the average deviation of the 
item about the mean in a series is called ‘Dispersion’. 


MEANING OF DISPERSION 


In the general meaning the dispersion means the scatter or spread 
of the difference of the items from their mean in a series. But in 
special meaning the dispersion is used in two senses : (i) Dispersion 
means the spread of limits of the series, that is, the difference of the 
limits of the items in a series is expressed by the dispersion (ii) In the 
second sense the dispersion means the average of the deviations of 
the items taken from the mean in a series. On the basis of these two 
senses the different methods of the measures of dispersion are 
determined. 


DEFINITION 


According to Spiegel , “The degree to which numerical data tend 
to spread about an average value is called the variation of dispersion 
of the data.” 

According to Simpson and Kafka , “The measurement of the 
scatterness of the mass of figures in a series about an average is 
called measure of variation of dispersion.” 

According to Bowley , “Dispersion is the measure ,of the variation 
of the items.” 

According to Professor King , “Dispersion indicates the facts that 
within a given group there is lack of uniformity in the items values.” 

According to Brookes and Dick , “The dispersion or spread is the 
degree of Scatter or of variation of the variables about a central 
value.” According to Conner , “Dispersion is the measure of the 
extent of which the individual items vary.” . 


ABSOLUTE AND RELATIVE DISPERSION 


Calculation of dispersion can be done in two ways : Absolute and 
relative. Their descriptions are as follows : 


(a) Absolute measure : When the measure of dispersion or 
variation in a series is determined in absolute form, that is, in the unit 
of the series then it is called absolute dispersion. 


(b) Relative measure : The biggest drawback of absolute 
measure is that two or more series can’t be compared by it because 
different series may have different units and to compare among 
different units is not possible. So for comparative study, relative 
measure of dispersion is determined. Relative measure is not 
expressed in the unit of the series. It is expressed in the form of 
percentage or ratio. For the calculation of relative measure, the 
percentage or ratio is determined by dividing the absolute measure 
of dispersion by corresponding measure of central tendency. The 
relative measure of dispersion is also called coefficient of dispersion. 

OBJECT OR IMPORTANCE OF DISPERSION 

The main objects of a measure of dispersion are as follows : 

(1 ) To gauge the reliability of an average : An average is a 
representative of a series. In order to know to what extent an 
average is representative of the series the dispersion is calculated. If 
measure of dispersion is small, the average is good i.e. , the 
average is true representative of items of the series. 

(2) To know the structure of the series : Average is unable to 
give the knowledge of structure of the series. This knowledge can be 
obtained by the dispersion range of. 

(3) To know the limits of the series : The knowledge about the 
range or the limits of the series is obtained by finding the dispersion. 
Which is useful in various statistical purposes. 

(4) To study the variability : The knowledge of variability or items 
values or the series from an average may be obtained by the 
dispersion. 

(5) To compare the variability : By finding the relative measure 
or dispersion of two series, the variability contained in item may be 


compared. If series has more coefficient of variation, the amount of 
the variation present in that series will be more. 

(6) To serve as base for other statistical measures 
Calculation of dispersion is essential to obtain some other statistical 
measures like such as correlation, regression etc. 

(7) To serve as a base for the control of the variability : 
Measure of dispersion does not give the knowledge only about the 
variability of the series but it also provides the proper base to control 
it. For example, in industrial production to control the Quality of 
production, the variability of the Quality is determined and it is 
controlled by quality control method. 


CHARACTERISTICS OF A GOOD MEASURE OF 
DISPERSION 


There should be the following characteristics in a good measure of 
dispersion : 

(1) It should be based on all item values. 

(2) It should be rigidly defined. 

(3) It should be easy to calculate. 

(4) Its mathematical description should be easy. 

(5) It should be capable of further algebraic treatment. 

(6) It should be least affected by fluctuation of sampling. 


METHODS OF MEASURING DISPERSION 


As it is cleared in the meaning of dispersion, the dispersion is 
measured by two main types of methods : 

First absolute method which includes Range, Inter quartile range, 
mean deviation and standard deviation and second relative method 
which includes coefficient of range, coefficient of quartile deviation, 
coefficient of mean deviation, coefficient of standard deviation and 
coefficient variation. In the view of convenience of study, the 
methods of measuring dispersion can classified into the following 
ways : 

(a) Methods of Limit 


(1) Range 

(2) Inter Quartile Range 

(3) Percentile Range 

(b) Methods of Averaging Deviation 

(4) Quartile Deviation of semi-inter Quartile Range. 

(5) First moment of Dispersion or Mean Deviation. 

(6) Second Moment of Dispersion or Standard Deviation. 
(7) Other Miscellaneous Measures. 

(c) Graphic Method 

(8) Lorenz Curve. 


RANGE 


The difference between the largest and smallest values of the 
series is called range. It is the simplest method to measure the 
dispersion. If the range is small, the series is considered to be 
regular and if range is more then irregular. Calculation method—The 
largest and smallest values are determined by inspection. There are 
two opinions about the determination of the largest and smallest 
values in a continuous series : (i) The lower limit of the smallest 
class is assumed as the smallest value and the upper limit of the 
largest class is assumed as the largest value. 

(ii) The mid value of the smallest class is assumed as the smallest 
value and the mid value of the largest class is assumed as the 
largest value. The first opinion out of these two is widely used. 


After finding the smallest and largest value, the range is calculated 
by the following 
formula: Range(R)=M4,-Mg 

where, R = range 

M 4 = largest value 

M 9 = smallest value 


Coefficient of range : To compare between two or more series it 
is essential to convert the absolute measure into relative measure of 


dispersion, coefficient of range is calculated for it. Its formula is as 
follows : 

Coefficient of Range ( C.R. ) = 7% 
Illustration 1. (Individual series) 


From the following data calculate range and coefficient of range : 
15, 4, 18, 8, 10, 9, 20, 47 

Solution : 
Maximum value of series M 4 = 47 


Minimum value of series M g =4 
Hence, Range(R)=M4-Mg 
=47-4=43 


Coefficient of Range ( C.R. )= "+m" 1731 = 0,843 
Hence the range of series will be 43 and coefficient of range will 
be 0.843 or 84.3%. 


Illustration 2. (Discrete series) 
Find the range and coefficient of range of the following data : 
Value 4 8 12 16 20 24 28 
Frequency 5 8 14 25 187 3 
Solution : 
By inspection, minimum value M 9 = 4, maximum value M 4 = 28 
Range(R)=M41-Mg 
=28-4=24 
Coefficient of range ( C.R. ) = m= = 871-8 =0.75 
So range will be 24 and coefficient of range will be 75%. 
Illustration 3. 
Calculate absolute value and relative value of range from the 
following data : 


X5678910 11 12 


Y 4813 19 40 2273 


Solution : 
Absolute value of range, R=M41—-Mg 
R=12-5=7 


Relative value of range = coefficient of range 


M, —- Mo 
= M™+M 


= ore = 7 = 0.412 
Illustration 4. (Continuous Series) 
Find the range and coefficient of range of the following frequency 
distribution : 
Size 0-10 10—20 20-30 30—40 40-50 
Frequency 46 129 2 
Solution: 
M 9 = Lower limit of the lowest class 0 — 10 = 0 
M 4 = Upper limit of the greatest class 40 — 50 = 50 


Hence, Range(R)=M41-Mg 

= 50-0=50 

Coefficient of range ( C.R. ) = ™=™ 

= fen. =" 

Hence range will be so and coefficient of range will be 100%. 
Illustration 5. (Continuous series in inclusive series) 

Calculate the range and its coefficient of the following distribution : 


Class 5—9 10-14 15-19 20-24 25-29 


Frequency 25963 
Solution: 
Since the series is inclusive, hence it is essential to convert it into 
exclusive series before finding the range. 
Class Frequency 


4.5-9.5 2 Here the lower limit of the lowest class 4.5—9.5 is 4.5 and 
upper 
9.5—14.5 5 limit of the highest class 24.5—29.5 is 29.5, So 
14.5-1959R=M41—-Mg 
19.5-24.56 =29.5-—4.5=25 
24.5-29.5 3 
Cofficient of range ( C.R. ) = rm = 33s5- 5 
= 0.735 or 73.5% 
Illustration 6. 


Find range and coefficient of range for the following data : 
4,-1,-2,-4, 2,3, 1,5 
Solution: 
Here greatest item = 5 
lowest item = — 4 
.. Range = Greatest item — Lowest item 
=5§-(-4)=5+4=9 
Coefficient of Range = Grastinm=toreseiten 


5 — (-4) 


— 5+(-4) 


Advantages of Range 

Following are the advantages of range : 

(1) Easy calculation : The calculation of the range is very simple. 
Any special type of knowledge, for it, is not required. 

(2) Knowledge of limit : By range we can get the clear 
knowledge of limit under that all item values of the series scatter. 

(3) All item values are not required : Computation of all item 
values of whole series is not required for the calculation of range. It 
may be calculated only on the basis of knowledge of the smallest 
and largest item values : 


(4) Useful in Quality control : Range is extremely useful in 
Quality control of commodities in industrial areas and_ price 
fluctuations. 

(5) Other uses : Range is used in other mathematical 
calculations, for example, to study the change in rates of interest. 


Disadvantages of Range 


As against the above merits the range has _ following 
disadvantages also: 


(1) Uncertain measure : Range is an uncertain measure of 
dispersion which is determined by taking into account only two 
values /.e., the largest and smallest one. 

(2) Unstable measure : Range is very unstable measure. It is 
changed by the variation in the extreme values. For example, the 
maximum weight of the students of a class is 58 kg and minimum 
weight is 40 kg then their range will be 58 — 40 = 18 kg. Now if a 
student having weight of 80 kg gets admission into the class then 
range will be changed and it will become 80 — 40 = 40 kg. That is 
why it is considered as a very unstable measure. 

(3) It does not give the knowledge about the structure of the 
series : The proper knowledge of the structure of the series is not 
achieved by the range, /.e. , we can’t know how the item values are 
scattered. So it may happen that two series having the same range 
but may have different compositions. 

(4) Unfair estimate of frequency distribution : Range is 
calculated on the basis of only extreme values. Generally the 
extreme value are not normal. So the estimate of frequency 
distribution may be unfair. 


(5) It does not give importance to all items : The smallest and 
largest values of the series are taken into account in range and 
remaining item values have no importance and hence the chance of 
getting unreliable results may increase. 

Main Uses of Range : Range is commonly used in the following 
fields : 

(i) Statistical Quality Control. 


(ii) Weather forecasting. 
(iii) Fluctuations in the prices of commodities, stocks and shares. 


INTER QUARTILE RANGE 


The difference between the third and first Quariles of a series is 
called Inter Quartile range. 


Calculation method : First of all the third Quartile (Q 3 ) and first 
Quartile ( Q 4 ) of a series are calculated. Then after the Inter 
Quartile range PS determined by the following formula : 

LQ.R.=Q3-Q 4 
where, /.Q.R. = Inter Quartile range 
Q 3 = Third Quartile 


Q 4 = First Quartile 


Coefficient of Inter Quartile Range 


Coefficient of inter Quartile range is calculated with the help of the 
following formula : 

Coefficient of |.Q.R. = aa 

Inter Quartile range is more better than range because range is 
calculated on the basis of only extreme values while it is calculated 
on the basis of Quartiles. Its computation is very easy and it is least 
affected by the uncertainty of extreme values. 

But it is not also called a representative measure of the series 
because only 50% items are included in it and no knowledge of the 


composition of the series is obtained by it. Hence it is also an 
unstable measure of dispersion. 


PERCENTILE RANGE 


Percentile range is a measure of dispersion in which range of 80% 
values is determined. Its calculation method is similar to that of inter 
quartile range. It is the difference of 90th and 10th percentiles. 


Calculation method : Firstly 90th and 10th percentiles arc 
determined. Then after with the help of following formula, percentile 
range is determined : 

Percentile Range (PR. )=P99-P 49 

It is also called ‘Decile Range’ (D g — D 4 ) because the value of P 
99 and D g and the value of P 4g and D 1 are same. Its merits and 


demerits are also same as the merits and demerits of range and 
Inter Quartile range but this method is more suitable than those 
because it is based on 80% items. 
Illustration 7. 

Find Inter-quartile Range and Percentile Range from the following 
data : 


Height in inch 58 59 60 61 62 63 64 65 66 
Frequency 15 20 32 35 33 22 20 108 
Solution : 


Height (in inches) Frequency ( f ) c.f 
58 15 15 

59 20 35 

60 32 67 

61 35 102 

62 33 135 

63 22 157 

64 20 177 

65 10 187 


66 8 195 
N = 195 


N 


Q 4 =size of = th item 


= size of “: thitem 
= size of 49th item 


Since 49 first comes in c.f. 67 so item against it /.e., 60 will be first 
quartile (Q 4 ) 
Q 3 = size of 


3LN + 1) 
4 


th item 


3(195 + 1) 


=sizeof «  thitem 

= size of 147th item 

Since 147 first comes in c.f. 157 so item it 63 will be Q 3. 
Now /.Q.R.=Q3-Q 1 

= 63 — 60 = 3 inches 

Similarly P gg = size of “iw” th item 


(195 + 1) 


= size of “in thitem 
= size of 176.4th item 


Since 176.4 first comes in c.f. 177 so item against it 64 will be P 
99 orDg. 

P 49 = size of “™” th item 

= size of “im thitem 

= size of 19.6 th item 

Since 19.6 first comes in c.f. 35 so item against it 59 will be P 49 


or D 4 
Now PR. = P gg — P 419 = 64-59 = 5 inches 
So /.Q.R. = 3 and PR. = 5 inches 
Illustration 8. 


Calculate Inter-quartile Range and Percentile Range from the 
following data : 


Year 0-10 10-20 20-30 30-40 40-50 50-60 
No. of persons 48 156 43 
Solution: 


Year No. of persons Cumulative frequency 
f c.f. 

0-1044 

10-20 8 12 

20-30 15 27 

30—40 6 33 

40-50 4 37 

50-60 3 40 


N = 40 

Calculation of Inter Quartile Range (1.Q.R.) 

q 4 =size of + thitem q 3 =size of * thitem 

= size of ° thitem = size of = thitem 

= size of 10th item = size of 30th item 

Since 10 first comes in c.f. 12 so It first comes in c.f. 33 so Q 3 


group will be 

Q 4 group will be 10 — 20 30 — 40 

Now,Q4=L4+ 7 (q4-c)Now,Q3=L4+ 7 (q3-C) 
=10+ “s" (10-—4)=30+ “s” (30-27) 

=10+ “ =30+ 

= 17.5 years = 30 + 5 = 35 years 

Now /.Q.R. = Q 3-Q 4 = 36.25 — 17.5 = 18.75 years 

Calculation of Percentile Range (P.R.) 

P 10 =size of ™ th item p gq = size of ™ thitem 


90 


a th item 


= size of 1 th item = size of 


= size of 4th item = size of 36th item 
It first comes in c.f. 12 so P 49 group It first comes in c.f. 37 so P 


99 group 
= 10-20 = 40 —- 50 
Now P49 =L4+ 7 (pi9-¢)Pg99=L4+ 7 (Pgo-¢) 
=10+ * (4-4) =40+ “=” (36 — 33) 
=10+ “s° =10 years = 40+ **° = 47.5 years 
Now PR.=P99-P 49 
= 47.5 — 10 = 37.5 years 
Quartile Deviation or Semi-inter Quartile Range 
This measure of dispersion is based on Quartiles. Half the Inter 
Quartile range is called Quartile deviation or semi Inter Quartile 
range. We present it in the form of following 
formula : 
Quartile Deviation (Q.D. ) = “=" 


Coefficient of Quartile Deviation 


Qs — @ 


Coefficient of Q@.D. = “3° = &0 
Illustration 9. 


Calculate Quartile Deviation and its coefficient from the following 
data : 


Age in years 15 16 17 18 19 20 


No. of students 4 7 12 16 11 10 
Solution : 


Age in years No. of students c.f. 


(f) 
1544 
167 11 
17 12 23 
18 16 39 


19 11 50 
20 10 60 
N = 60 


N+1 3.N + 1) 


Q 4 =sizeof ~ thitem Q 3 =sizeof = thitem 


60+1 3(60 + 1) 
4 


= size of th item = size of ~+= thitem 
= size of 15.25 th item = size of 45.75 th item 


It first comes in c.f. 23 so item It first comes in c.f 50 so item 
against 
against it 17 will be Q 4 . So it 19 willbe Q 3 .SoQ 3 = 19 years 
Q 4 =17 years 
Now Quartile Deviation (Q.D. ) = “=* 


- wou - 4 year 
Coefficient of Q.D. = 


19-17 


=e = » = 0.5555 or 55.55% 
Illustration 10. 


Calculate Quartile Deviation and its coefficient from the following 
data : 


Class 0—10 10—20 20-30 30—40 40-50 50-60 
Frequency 3 18 30 2243 
Solution: 


Class Frequency (f ) c.f. 
0-1033 

10-20 18 21 

20-30 30 51 

30-40 22 73 

40-50 4 77 

50-60 3 80 


N = 80 


3N 


q4=sizeof thitem q 3 =sizeof = thitem 
= size of * = 20th item = size of “* = 60th item 
So Q 4 group = 10-20 So Q 3 group = 30-40 
NowQ ,=L4+ 7 (q4-c)NowQ 3=L4t+#=++(qg3-C) 
=10+ “s" (20—-3)=30+ “=” (60-51) 
=10+ « x 17=19.44 = 30+ » x 9 = 34.09 
Quartile Deviation (Q.D. ) = “=* 

Ste G20 

Coeff. of Q.D. = aa 

= wos = 353 = 0.272 or 27.2% 

Illustration 11. 


Find inter-quartile range and quartile deviation and its coefficient 
from the following data : 


Marks (Less than) 10 20 30 40 50 60 70 80 90 
No. of students 5 15 98 242 367 405 425 438 439 
Solution : 


First we shall convert the series into common continuous series 
from less than c.f. 


Marks No. of students (f) c.f. 
0-10 (5-0)=55 

10-20 (15-5) = 1015 

20-30 (98 — 15) = 83 98 
30-40 (242 — 98) = 144 242 
40-50 (367 — 242) = 125 367 
50-60 (405 — 367) = 38 405 
60-70 (425 — 405) = 20 425 
70-80 (438 — 425) = 13 438 
80-90 (439 — 438) = 1 439 


N = 439 


an 
4 


Calculation of Inter Quartile Range q 3 = size of * th item 


3 x 439 


q 1 =size of * thitem = size of ~*~ thitem 
= size of « = 109.75th item = size of 329.25 
So Q 4 group = 30-40 So Q 3 group = 40 — 50 


NowQ4=L4+ 7° (q4-c)NowQ3=L4+ 7 +(q3-C 

) 

= 30 + “is” (109.75-98) = 40 + "x" (329.25 — 242) 

= 30+ “i” = 30.82 marks = 40 + « x 87.25 = 46.98 marks 
LQ.R.=Q3-Q 4 


= 46.98 — 30.82 

= 16.16 marks 

Quartile Deviation Q.D. = “z"-*" = 8.08 marks 
Quartile Deviation Coefficient = = 0.208 


Exercise 8 (A) 
1. Calculate the value of range and coefficient of range from the following data : 
Wages (in © ) 50 60 70 80 90 100 
No. of Persons 20 35 56 44 25 10 
[ Ans.: R =~ 50, CR. =0.38] 
2. From the following data, calculate inter-quartile range and its coefficient : 
Height (In inches) 55 56 57 58 59 60 61 62 
No. of students 18 22 40 25 24 12 8 1 
[ Ans.: Q 4 =56 inch, Q 3 =59 inch, /.Q.R . = 3 inch, coeff. of .Q.R. = 0.026] 
3. Find the range, inter-quartile range and its coefficient from the following data : 
Marks (Less than) 10 20 30 40 50 60 70 80 
No. of students 5 15 70 140 200 240 250 225 
[ Ans. : R = 80 marks, C.R. = 1, 1.Q.R. = 19.68 marks, coeff. of .Q.R. = 0.254] 


4. Find percentile range and inter-quartile range from the following data : 
Size (more than) 20 30 40 50 60 70 
Frequency 100 88 68 32 155 
[ Ans. : PR = 36.67, 1.Q.R. = 17.62] 
5. From the following data find the quartile deviation : 
S.No. 1234567 
Size of Items 8 10 15 27 35 42 50 
[ Ans.:Q , =10, Q 3 =42, Q.D. = 16] 
6. Calculate quartile deviation and its coefficient from the following data : 
x4567891011 12 
£324123421 
[ Ans. : Q.D. = 2, coeff. of Q.D. = 0.25] 
7. Calculate quartile deviation and its coefficient from the following data : 
x 10 20 30 40 50 60 70 
f6869965 
[ Ans. : Q.D. = 15, coeff. of Q.D. = 0.43] 
8. Find out quartile deviation and its coefficient from the following data : 
Weight (in kgs) 10-12 12-14 14-16 16-18 18-20 20-22 22-24 
No. of Boxes 2 9 20 25 24 155 
[ Ans. : Q.D. = 2.09, coeff. of Q.D. = 0.119] 
9. Calculate quartile coefficient based on the following data : 
Salary (less than) 10 20 30 40 50 60 70 
No. of Employees 3 8 15 20 30 33 35 
[ Ans. : Q.D. = 12.59, coeff. of Q.D. = 0.374] 
10. Calculate quartile deviation and its coefficient from the following data : 


Size (more than) 70 60 50 40 30 20 


Frequency 7 18 40 40 63 68 
[ Ans. : Q.D. = 12.845, coeff. of Q.D. = 0.267] 
11. Find the Quartile Deviation and its coefficient from the following data : 
Age in years 15 16 17 18 19 20 21 
No. of Students 4610 151294 
[ Ans. : Q.D. = 1 year, Coeff. of Q.D. = 0.055] 
12. Calculate Quartile Deviation and its coefficient from the data given in the 


following table : 


Central size of item 1234567 8910 
Frequency 29 11 14 20 24 20 16 5 3 


[ Ans. : Q.D. = 1.504, coeff. of Q.D. = 0.267] 


Mean Deviation 


The calculations of range and quartile deviation are done on the 
basis of two points are values of the series from the extreme values 
in case of range and from quartiles values in case of quartile 
deviation. These are not based on all the items. Mean deviation and 
standard deviation over come these drawbacks. Their calculations 
are done on the basis of all items of the series. 

Mean deviation of a series is the arithmetic mean of the deviations 
of various items from a statistical average. Statistical average may 
be either mean, mode or median. Generally median is considered a 
proper representative of a series and hence calculation of mean 
deviation with its help is considered good. If it is not clear in the 
Question from which statistical average the calculation of mean 
deviation is to do, then the median is always used for it. 

Mean deviation is also called first moment of dispersion. 


Calculation Method 


Its calculation method is very simple. First statistical average 
(mean, median or mode) from which the mean deviation is to be 
determined, is determined. Then the absolute deviations of all items 
of series from mean are determined plus (+) and minus (—) signs are 


ignored at the time of finding deviations, that is, all the deviations are 
considered positive we calculate the arithmetic mean of all these 
deviations. This is the mean deviation of the series. The greater the 
mean deviation, the greater is the scatter or spread or dispersion in 
the series. Stepwise information of the calculation method is as 
follows : 


(1) Selection of average : In the calculation of mean deviation 
first it is to decide that on the basis of which average out of three 
statistical averages arithmetic mean, median or mode, the 
calculation is to done. The students should calculate on the basis of 
statistical average given in the problem but if nothing is said about it 
in the problem then they should always calculate on the basis of 
median. Hence first the average, about which the calculation is to be 
done, is determined from the series. 

(2) Determination of deviation : The deviation| d | or| dM| or 
| dz | of each observation from the above mean is determined by 
leaving the plus (+) or minus (—) signs. If we are calculating in 
individual series then we find the total of these deviations (2 | d|), if 
there is discrete series then by multiplying these deviations with 
corresponding frequencies we find the total of products (2 f| d |) 
and if these is continuous series then such deviations are 
determined from their mid values and these are multiplied with 
corresponding frequencies and total of products ( 2 f | d |) is 
determined. After it the mean deviation is calculated by the following 


formula : 
Base Individual Series Discrete or Continuous Series 


Sax Silas] 


Mean Deviation from MeandOd =~ 6 = »y 
Mean Deviation from Median 5 M = *s_5 M= 


Yfidz 
N 


Sfldm| 
N 


Mean Deviation from Mode 5 z= 5z= 


Where delta ( ) is a Greek letter. The symbol of statistical 
average about which mean deviation is calculated, is attached with it 
so that later it remains clear that about which average the mean 
deviation has been calculated coefficient of Mean Deviation. 


Coefficient of Mean Deviation 


Coefficient of mean deviation is obtained by dividing the mean 
deviation by corresponding average (about which mean deviation 
has been calculated). Their formulae are as follows : 


Coefficient of Mean Deviation from Mean = « 


Coefficient of Mean Deviation from Median = jr 
Coefficient of Mean Deviation from Mode = 

Calculation of Mean Deviation in Individual Series 
Direct method : The direct method of mean deviation is as 


follows : 

(i) The statistical average (arithmetic mean, median or mode) of 
the series, about which the mean deviation is to be determined, is 
calculated. 


(ii) We shall find the deviations of all items of the series from 
statistical average obtained above. Here plus (+) and minus (—) signs 
are ignored, /.e., the deviations are shown positive only such as 3 — 
8=5and8-3=5. 

(iii) Finding the total of deviations and dividing it by number of 
items, the mean deviation is calculated. 


Illustration 1. 
From the following data calculate mean deviation and their 
coefficient from median, mean and mode : 
S.No. 12345678910 
Height in inches 52 53 56 58 58 58 60 62 65 68 


Solution: 
S.No. Height Deviation from Deviation from Deviation from 


(in inches) Median ( X — M )| dM| Mean|d_ | Mode| az | 
X (+ and — sign ign.) (+ and — sign ign.) (+ and — sign ign.) 
152676 

253565 

356232 

458010 

558010 

658010 

760212 

862434 

965767 

10 68 109 10 


N=102 X =5902|dM|=362]| «| =382| dz| =36 

1. Calculation from Median 2. Calculation from Mean_ 3. 
Calculation from Mode 

M = Size of = thitem = By supervision 

=Sizeof ° =59Z=58 

= Size of 5.5th item 5.5th item 


size of Sth item + size of 6th item 
° 


So, M=58=n=% 

5M= = = 3.6 inches = 3.8 inches = 3.6 inches 

Coefficient of 5 M = % Coefficient of} = * Coefficient of 5 z = 7 
= % = 0.062 = % = 0.064 = # = 0.062 


Illustration 2. 
Find mean deviation about mode from the following table. Also 
find coefficient of mean deviation : 
Item 345678 
Frequency 10 15 25 20 18 12 


Solution : 
Here mode is 5. We have to determine mean deviation about it. 


Xf| X-5|=| dz| f| X-5|=2 fF] az| 
3 10 2 20 
415115 
5 25 0 0 
6 20 1 20 
7 18 2 36 
8 12 3 36 


Total 100 127 


=/|dz| 


Mean Deviation about Mode = = = “xv = tw = 1.27 
Coefficient of Mean Deviation “mers = 0.254 
Short-Cut Method 


There is also a short-cut method to calculate mean deviation. But 
it is not in more practice. The formula of mean deviation of this 
method is as follows : 


Mean deviation about arithmetic meand' = 
Mean deviation about median 6 M = 


where 2 M a = Total of values above the corresponding mean 
2 M Bp = Total of values below the corresponding mean 
N , = Number of items above the corresponding mean 
N gp = Number of items below the corresponding mean 
N = Total No. of items 
M = Median 
= Mean 
Illustration 3. 
Find the coefficient of Mean Deviation from the following data by 
using short-cut method : 


60, 59, 61, 63, 57, 66, 68 
Solution : 


S.No. X 


7 68 

N=72 X = 434 

wt = 62 

Calculation from Mean = * = =62 
2 M a = 63 + 66 + 68 = 197 

2M p=57 +59 + 60 + 61 = 237 

NA=3 

N p=4 

Substituting the value in formula 

5 = 

=e Se we 8148 

Coefficient of Mean Deviation = = 0.051 
Calculation of Median ( M ) : M = Size of = th item 
= Size of = th item = Size of 4th item = 61 
Now 2 M , = 63 + 66 + 68 = 197 

> M p=57+59+60=176 

NA=3 

N p=3 


Formula 6 jy = 


176 — 176 - (8 — 861 21 
= 7 = = 3 


Coefficient of 6 M = = 0.0491 


Second short-cut method : Second short-cut method may be 
also used to findout mean deviation. The formula to findout mean 
deviation in this method is as follows : 

1. From arithmetic mean: 5 = °° 8°" >" 

where, 2 | dx | = Sum of deviations of all items from assumed 
mean by ignoring the 
plus (+) and minus (—) signs 
= Arithmetic mean 
A = Assumed mean 


2 f B = Number of items below the mean 
2 f 4 = Number of items above the mean 


N = Number of all items 


DidM' +(M — A) (Sify - Sha) 
a 


2. From Median: 6 jy = 


where, 2 | dM'| = Sum of deviations of all items taken from 
assumed mean 
M = Median 
A = Assumed mean 
Rest notation according to previous case (1). 
3. From Mode : 5 z =~ 
where, 2 | dz’| = Sum of deviations from assumed mean ignoring 
positive (+) and 
negative (—) signs 
Z = Mode 
A = Assumed mean 
When the value of statistical average does not occur in whole 


number then for convenience in calculation, the above method is 
used by taking whole number as an assumed mean. 


Illustration 4. 


Find mean deviation from the data of example 3 by second or alternative short- 
cut method. 


Solution: 
S.No. x| dx| (A = 61) 
1574 
259 2fRB2 
3 60 1 
4610 
5 63 2 
666 2f,5 
7 687 
N=72% X = 434 21 
= * =62 
re) - Dea + OX - ’ (Sfp - Xf) 
= HHT SPs 8 AsO LS 3 


Calculation of Mean Deviation in Discrete Series 
Direct method : Calculation method of mean deviation in discrete 


series is as follows : 

(1) First of all we shall find the statistical average from which 
calculation is to be done. 

(2) We shall find the deviation of all items from the above mean. 
To find them we shall ignore the plus (+) and munus (-—) signs. 


(3) We shall multiply each deviation with corresponding frequency 


and find the total of products (2 f| d]). 
(4) We shall find mean deviation by using the following formula : 


a = ‘e) M= lee OZzZ= 
(5) Mean deviation coefficient may be determined by dividing 
mean deviation by related statistical average. 


Illustration 5. 
Find out mean deviation and their coefficient from mean, median and mode from 
the following data : 


Size of item 5 10 15 20 25 30 
Frequencies 269733 


Solution: 

From Mean From Median From Mode x =17 N =20 Z = 15 
Size Frequency f:Xc.f.| «| f| «|| dm|f| dm||dz| fl dz| 
X f 


52 10 2 12 24 10 20 10 20 
10 6 60 8 7 42 5 305 30 
159135172180000 

20 7 140 24 3215355 35 
25 3 75 27 8 24 10 30 10 30 
30 3 90 30 13 39 15 45 15 45 


N=302 XX fl «| 2 Ff| dm|zfildz| 

= 510 = 168 = 160 = 160 

Calculation from Mean Calculation from Median Calculation 
from Mode 

= “>a: M = Size of ~° thitem Z = 15 

17 = Size of 15.5th item Maximum frequency item 

15 by inspection 


B/la 


5 yy = 2 es ae 
— a AoW 

=» =5.6= w =5.33 = » = 5.33 

Coefficient of « = °* Coefficient of 5 M= * Coefficient 5 LZ. 

= 7 =0.329= ** = 0.355 = 

Short-Cut Method 


The calculation method of mean deviation in discrete series is as 
follows : 

1. First of we find the statistical average from which mean 
deviation is to be calculated. 

2. Each value of the series is multiplied by its frequency. 


5. 


® = 0.355 


3. Total of the products of values above the related average and 
their frequencies (2 Mf , ) is determined. 


4. Total of the products of values below the related average and 
their frequencies ( 2 Mf p ) is determined. 


5. If any value is equal to related average then product of that 
value and its frequency is left out. 


6. We calculate the total of the frequencies (2 f q ) corresponding 


to values above related average and total of the frequencies (2 f 
B ) corresponding to the values below the related average. 


7. At last, mean deviation is determined by using one of the 
following formulae : 


6M= 
6Z= 


Illustration 6. 
Calculate mean deviation and their coefficient from mean and median from the 
following data by using short-cut method : 


Size 2468 10 12 14 


Frequency 2136431 
Solution: 


Size Frequency ™.f. c.f. Calculation from Mean 
(f)(Xf.) = Wes = 8.2 

22422 Mf ,q = (40+ 36 + 14) = 90 

41432 Mf p=(48+18+4+4)=74 
631862 f,=4+3+1=8 
8648122 f p=6+34+1+2=12 


1044016 = 
12 3 36 19 


1411420 = “w= “nw” = 2.44 N = 20 164 
Calculation from Median 

M = Size of “2” th item 

= Size of 10.5th item 

10.5 lies in c.f. 12. Thus, M = 8 

2 Mf a =40 + 36+ 14 =90 

2 Mf p=18+4+4= 26 

2fA=4+3+1=8 

2fp=3t+1+2=6 


5 M= 

= MH O-98 — WB o> 4 

Coefficient of 6 = =0.3 
Calculation of Mean Deviation in Continuous 
Series 


The mean deviation in continuous series is determined in the 
same way as we have studied in descrete series. For it we find the 
mid values of the classes and assume then as and determine all 
deviations from them. Here it is essential to point out that mean and 
mode are calculated in continuous. After it deviations are determind 
for mean deviation by taking mid value of the series. 


Illustration 7. 
Compute the mean deviation and its coefficient for the scores of college 


students : 
Scores 140-150 150-160 160-170 170-180 180-190 190-200 


Frequency 46 101893 


Solution: 

It is not given clearly in the question from which statistical average 
mean deviation is to be calculated. So in such situation we always 
calculate from median. 


Marks Mid Value Freq. Cumulative Deviation from Product 
Frequency Median 


mv. (x) fo.f.| dm| M= 172.78 f| d yy| 
140-150 145 4 4 27.78 111.12 

150-160 155 6 10 17.78 106.68 

160-170 165 10 20 7.78 77.8 

170-180 175 18 38 2.22 39.96 

180-190 185 9 47 12.22 109.98 

190-200 195 3 50 22.22 66.66 


N=502 f| d py| = 512.2 

m = Size of > thitem 

= Size of » = 25th item 

Its lies in c.f. 38, so the median is (170-180) 

M=L4+ 7’ (m-c) 

=170+ “ss” (25-20) 

=170+ “s* = 172.78 

oMa 

= * = 10.244 marks 

Coefficient of 6 jy = = 0.059 
Illustration 8. 

Find mean deviation and its coefficient from the following 
frequency distribution : 

Class 0—10 10-20 20-30 30—40 40-50 50-60 
Frequency 4 6 12 187 3 

Solution: 
Class Mid value Frequency Deviation fdx Deviation Product 
Xf from assu- from mean f |__| 
med mean| |( = 29.9) 
dx ( A = 30) 


0-1054-—25-—125 24.9 99.6 

10-20 15 6 — 15 — 90 14.9 89.4 

20-30 25 12-—5- 604.9 58.8 

30-40 35 18590 5.1 91.8 

40-50 45 7 15 105 15.1 105.7 

50-60 55 3 25 75 25.1 75.3 

N=502 fox=-52Zf| |=520.6 
=A+ ss =30+ = =30-0.1=299 

a = We = 10.412 


x _ 10.412 
~ “29.9 


Coefficient of »: = 3 = 0.348 


Illustration 9. 

Find mean deviation and Quartile deviation of the following distribution. Also find 
their 
coefficients : 


Class interval 0-6 6-12 12-18 18—24 24-30 
Frequency 8 101295 

Solution: 

Calss Frequency c.f. Mid value | dm| f| dm| 


Interval of C.I. ( X ) M=14 
0-6 88311 88 
6-12 10 1895 50 
12-18 12 30 15 1 12 
18-24 9 39 21763 
24-30 5 44 27 13 65 


Total 44 — 278 
= 
. First quartile class is 6-12 
Q,=L+ "7 (where f= 10, C=8, /=6,L =6) 
=6+ “wo =6+ » =6+1.8=7.8 
= 33 -. Third quartile class is 18 — 24. 


3N #\i 


Q3=1+ 7" (where f= 9, C= 30, /=6, L = 18) 


=18+ oe ate 

=18 +} =18+2=20 

Quartile Deviation Q.D. = ““s" = "= “ =6.1 
f= 4% = 22 


c.f. greater than 22 is 30 so 12—18 is median class. 
Median = L + 


=12 + =12+ 2 =412+2=14 

Mean Deviation from Median = = % =6.318 
Coefficient of Quartile Deviation = = = = 0.4388 
Coefficient of Mean Deviation = Main "I" = 0.4513 


Illustration 10. 
Find mean deviation and its coefficient from the mode for the 
following data : 


Marks (Less than) 10 20 30 40 50 
No. of students 5 16 32 42 50 
Solution: 
First we change the series into continuous series from less than 
c.f. 
Marks Frequency Mid value Deviation from Product 
f x Mode | az | f| az | 
0-10 (5-0) =5 5 19.54 97.70 
10-20 (16 — 5) = 11 15 9.54 104.94 
20-30 (32 — 16) = 16 25 0.46 7.36 
30—40 (42 — 32) = 10 35 10.46 104.60 
40—50 (50 — 42) = 8 45 20.46 163.68 
N=502 f| dz| = 478.28 
By inspection 20—30 is modal class 


So, Z=L4+ 77 (Lo-L4) 
= 20 + (30 —20)= 20+" x 10 = 24.54 
Mean Deviation 5 z =~» = “s = 9.57 Mark 
Coefficient of6 z= = «= =0.39 
Illustration 11. 
Calculate mean deviation from arithmetic mean and median for 


the following data : 


Mid-value 25 35 45 55 65 
Frequency 471252 
Solution: 


Class Mid Frequ- Cumu- dx = 45 fdx Calculation Calculation 
value ency lative from mean from median 
X f freq. 
(c.f )| | fl «|| dm| f| dM| 
20-30 25 4 4 — 20 — 80 18 72 18.33 73.32 
30-40 35 7 11-10-70 8 56 8.33 58.31 
40-50 45 12 23 0 0 2 24 1.67 20.04 
50-60 55 5 28 10 50 12 60 11.67 58.35 
60-70 65 2 30 20 40 22 44 21.67 43.34 
N = 30 — 60 256 253.36 
2 fdx2 f|d |x f| dM| 

Calculation from Mean Calculation from Median 

=A +s m= Size of ® thitem 

= size of 15th item 

= 45 + > It first comes in c.f. 23 so median group will = 45 — 2 = 
43 be 40-50. 

Now, M=L 4 + “7 (m-c) 


=40+ ®s® (15-11) =40 + %# = 43.33 


Mean Deviation » = ** = % = 8.53 Mean Deviation 5 M = 
i" = 8.445 

Cofficient of « = * = “ = 0.198 Coefficient of 5 M = \v = 
0.195 
Illustration 12. 


Calculate mean deviation by short cut method from the following 
data : 


Marks 0-10 10—20 20-30 30-40 40-50 50-60 60-70 
Frequency 8 12 20 35 18 13 4 

Solution: 
Marks M.V. fc.f. m.f. 
xX ( fx ) 
0-10538402 Mf p = 40 + 180 + 500 = 720 
10-201512 201802 fp =8+12+20=40 


20-30 25 20 40 500 
30-40 35 35 75 1225 


40-50 45 18 93 810 2 Mf q = 1225 + 810 + 715 + 260 = 3010 
90-60 55 13 106 7152 fq =35+18+13+4=70 
60-70 65 4 110 260 
N =110 
m = Size of » thitemd M = 
= Size of 55th item = 


Its lies in c.f. 75, so the = ~ i 
median group is (30-40) = 11.466 


Now, M=L 4+ 7' (m-c) 

= 30+ “= (55 — 40) 

= 30+ “» = 34.29 
Illustration 13. 


Find mean deviation from mode by using short-cut method from the following 
data : 


Wages 10-20 20-30 30—40 40-50 50-60 60-70 70-80 
No. of workers 2449632 
Solution : 
Wages No. of workers Mid value Product ( ™.f. ) 
fx (xf) 
10-20 2 15 30 
20-30 42 Mf p=1925 1002 Mf p=675 


30—40 4 35 140 
40-50 9 45 405 
50-60 6 55 330 


60-7032 Mf , =1165195 2 Mf , = 675 
70-80 2 75 150 
N = 30 
By inspection modal class = 40-50 
Now, Z=L 4+ meneh X (L 9-L4) 
= 40 + ws (50-40) =40+ °s” = 46.25 
Mean Deviation 6 Z = 
Sg eS es 
Advantages of Mean Deviation 
(1) Based on all values : Mean deviation is based on all item 


values of the series, hence it represents the composition of the 
series Clearly. 

(2) Little effect of extreme values : The change in extreme 
values has less effect on this measure of dispersion. . 

(3) Easy in calculation : Its calculation is very easy. 

(4) Calculation from any average : Calculation of mean 
deviation can be done from any average; arithmetic mean or median 


or mode. 

(5) Easy to understand : Mean deviation can be easily 
understood. 

(6) Determination of limits in Normal Distribution : In normal 
distribution, the limits of items of the series may be determined on 
the basis of mean and mean deviation. 


Generally 57.5% items are included in * or M+ 
Disadvantages of Mean Deviation 

(1) Ignorance of Signs : Plus (+) and minus (—) signs are ignored 
at the time of determining the deviations of items from the statistical 


average for the calculation of mean deviation. that is, all deviations 
are assumed positive. It is a serious mathematic error. 

(2) Not capable of Algebraic Treatment : Due to ignorance of 
signs, it is not an adequate measure from the mathematical point of 
view. Hence its use is not possible further in algebraic treatment. 

(3) Unreliability : The calculation of mean deviation is possible 
from any statistical” average. Hence when it is calculated from mode 
it becomes unreliable due to unreliability of mode. 

(4) Uncertainty : Its calculation is done from different statistical 
averages. Hence its different values are obtained by the calculation 
from different averages. Due to it uncertainty remains in its measure. 

Utility : Having above disadvantages this measure is very much 
useful for small samples. It is very much used in economic; social 
and applied fields. National Bureau of Economic Research uses 
mean deviation to find variability of business cycles. This measure is 
also used to study the heterogeneity of income. 


Exercise 8 (B) 


1. Size of seven forms are given below. Calculate mean deviation from the mean 
and median : 

Size of farms : 30, 25, 29, 20, 35, 45, 44 

[ Ans. : 7.51, 7.14] 


2. Calculate mean deviation from median and its coefficient from the following data 


Roll Number 123456789 
Marks obtained 54 71 57 52 49 45 72 57 47 
[ Ans.: 5 M =8.33, Coefficient = 0.17] 
3. Find out the mean deviation about median for the following distribution : 
Value of variable 6 12 18 24 30 36 42 
Frequency 4791815105 
[ Ans.: M = 24,6 yy =7.5] 
4. Calculate (a) median coefficient of dispersion, and (b) mean coefficient of 
dispersion from the following 
data : 
Size 4681012 1416 
Frequency 2453214 
[ Ans. : M = 8, 5 M = 3.24, Coefficient 0.405; = 9.71, «x = 3.32, 
coefficient = 0.34] 
5. Find out the mean deviation and its coefficient for the following : 
Weekly wages (Rs.) 2 — 44 — 66 — 88 — 10 
Workers 20 40 30 10 
[ Ans. : M =5.5, 6 M = 1.50, Coefficient of M.D. = 0.273] 
6. Calculate mean deviation from median for the following data : 


Class 0-66-12 12-18 18 — 24 24 — 30 


Frequency 8101296 
[ Ans. : M = 14.25, 6 M = 6.42] 
7. Calculate mean deviation for mean of the following data : 
Size 3—4 4-5 5-6 6-7 7-8 8-9 9-10 
Frequency 3 7 22 60 85 32 8 
[Ans.: =/7.1, ox =0.913] 
8. Find out mean deviation through mean and its coefficient from the following 
data : 
Marks 10-20 10-30 10—40 10-50 10-60 10—70 10-80 10-90 
Frequency 15 25 37 37 58 69 80 100 
[ Ans.: =52.9, Mean Deviation = 20.805, Coefficient of M.D. = 0.39] 
9. Calculate mean deviation about median and its coefficient from the following 
frequency distribution: 
Class 1-10 11-20 21-30 31-40 41-50 
Frequency 46 1073 
[ Ans. : M = 25.5, 6 M =9, = 0.35] 
10. Find mean deviation from mean and its coefficient for the scores of the college 
students : 
Score 140-150 150-160 160-170 170-180 180-190 190-200 
Frequency 46 10189 3 
[ Ans.: =171.2 marks, »x = 10.56 marks, coeff. »x = 0.062] 
11. From the following data calculate mean deviation from median and its 
coefficient : 
Age in years 1—5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 
No. of Persons 7 10 16 32 24181051 
[ Ans. : M = 19.95 years, 6 M = 7.103, coeff. of 6 M = 0.36] 


12. Calculate mean deviation from mean from the following data : 
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 
No. of students 4 6 10 20 106 4 

[ Ans.: = 35, ox = 11.33] 

13. Calculate mean deviation from median from the following data : 


Marks (Less than) 80 70 60 50 40 30 20 10 
No. of students 100 90 80 60 32 20 135 


[ Ans.: M = 46.43, 0 M = 14.286] 


Standard Deviation 

Standard deviation is the most popular and suitable measure of 
dispersion. Karl Pearson had given its concept for the first time in 
1893 it is considered as the most important and scientific measure of 
dispersion. It is free from disadvantages of other measures of 
dispersion. It is calculated on the basis of arithmetic mean. At the 
time of its calculation plus (+) and minus 
(—) signs are not ignored but all are made positive by squaring 
deviations. Thus there is no inadequacy in it with respect to 
mathematical point of view. The square root of simple mean of 
squares of these deviations is called standard deviation. So it may 
be defined as : 

“Standard deviation is the square root of the arithmetic mean of 
the squares of deviations of items from the arithmetic mean.” 


Standard deviation is also called second moment of dispersion. 


The Greek letter (1 (Sigma) is used in the form of symbol to denote it. 
Coefficient of Standard Deviation 

Standard deviation is also on absolute measure like other 
measures of dispersion. So unit of items is also written with it. To 
convert it into relative measure the coefficient of standard deviation 
is calculated. For it standard deviation is divided by arithmetic mean. 
Its formula is as follows : 


Coefficient of Standard Deviation = x 


Coefficient of variation is obtained by converting it into percentage 


Coefficient of Variation C.V. = x x 100 


Variance : Square of standard deviation is called variance. Hence 


variance = o 2 . 


METHOD OF CALCULATING STANDARD 
DEVIATION 


(A) Method of Calculating Standard Deviation in 


Individual Series 
(1) Direct Method (2) Short-cut Method 
(3) Step-deviation Method (4) Square of Values Method 


(1) Calculation of standard deviation by direct method : 
Calculation work is done by direct method in following way : 
(i) First arithmetic mean of the series is determined. 
(ii) The deviation of all items from above calculated mean ( d ) is 
determined. 
d =( X-— «x ) (here ‘+’ or —’ signs are to be noted) 
(iii) Square these deviations and find their sum (2 d 2 ). 


(iv) Standard deviation is determined with the help of following 
formula : = 


S.D. oro = 
Illustration 1. 
Marks obtained by 8 students are as follows, find standard deviation and its 
coefficient : 
20 25 40 45 23 33 34 28 
Solution : 
Marks Deviation from Mean Square of Deviations 
Xd( =31)d? 
20-11 121 
25 — 6 36 


=X =248r d2 =520 

= =f = w = 8.06 marks 
Coefficient of S.D. = =‘ =0.26 

= = 9 = 31 

(2) Calculation of mean deviation by short-cut method : When 
arithmetic mean is a whole number then it is easy to use the direct 
method in individual series. If it is not so then the calculation by 
short-cut method is rather easy. In short-cut method the calculation 
work is done as follows : 

(i) The deviation ( dx ) of all items of the series from an assumed 
mean is determined. Sum of all these deviations ( 2 dx ) is 
determined. 

dx =( X —A ) (Here signs are to be noted) 
(ii) Each deviation is squared separately and their sum ( 2 dx 2 ) 


is determined. 
(iii) Stand ard deviation is calculated by using any one of the 
following formula : 


(a) Oo = 


(Cc)o = 

All these formula are the different forms of the same formula. 
Which are shown in separate forms through mathematical device. 
Out of these formulae, the first formula (a) is frequently used. 


Illustration 2. 
Calculate standard deviation of short-cut method from the following data : 
132 130 135 140 138 128 150 149 133 142 


Solution : 


X Deviation from assumed mean dx 2 


dx ( A = 140) 
132 —8 64 
130 — 10 100 
135-5 25 
14000 
138-24 
128 — 12 144 
150 10 100 
149 9 81 
133 —7 49 
14224 
> X = 1387 2 dx =—232 dx? =571 
Standard Deviation o = 
= so -(o) = gisae = aa = 7.198 
(3) Calculation of standard deviation by step deviation 


method : If the deviation ( dx ) from assumed mean in short-cut 
method comes out to be large then its square and other calculations 
becomes difficult. In such situation all values of dx are shortened by 
dividing by a common number. We denote this by symbol dx ' on ds. 
Now we find the total of dx ' and get squares of dx ' for each item ( d 
2x ') and then get their total (2 d 2 x '). Standard deviation is 
calculated with the help of following formula : 


Here / is that common number by which all deviations ( dx ) are 
divided. 

(4) Calculation of standard deviation by dquare of values 
method : There is no need of getting deviations in this method. 
Under it, separate square of each item of the series is determined. 


After it we find the total of items (2 X ) and total of squares of items 


(2 xX 2 ) and use the following formula to calculate standard 


deviation : 
ie JX? (SX) 
Standard deviation o = \* 


= | N} 


Illustration 3. 
Find standard deviation and its coefficient by “square of values method' from the 
following data : 
X : 30 48 32 40 45 38 50 41 34 42 
Solution: 
xx? 
30 900 
48 2304 
32 1024 
40 1600 
45 2025 
38 1444 
50 2500 
41 1681 
34 1156 
42 1764 


> X=4005 X2 = 16398 


Standard Deviation o =\* \¥! = 


= views-100 = Vos = 6.31 

Arithmetic Mean = * ~~ =40 

Coefficient of S.D. = *-"* = 0.158 

(B) Method of Calculating Standard Deviation in 


Descrete Series 
Calculation of standard deviation by direct method : 
(1) Arithmetic mean of the series is determined. 


(2) The deviation ( dx ) of each item from the arithmetic mean is 
determined d = ( X — _ ) (plus (+) and minus (—) signs are to be 


noted) 

(3) Square of so obtained deviation is determined, i.e., d ao 

(4) By multiplying each square of deviation ( d 2 ) by its frequency 
(f) we find fa 2 . 

(5) Total of all products ( 2 fd 2 ) is determined. Then standard 
deviation is calculated by using the following formula : 


[DA 


O== pe or \ >/ 

N =2 f for the series having frequency. 
Illustration 4. 

From the following data, find standard deviation : 

x 24681012 
f71318962 
Solution : 
x f fx Deviation from Square of Product 
Mean (d)( =6) Deviation (d 7) (fd 2) 
2714-4 16 112 
41352-2452 
618108000 
897224 36 
10 6 60 4 16 96 
12 2 24 6 36 72 
N = 55 330 76 = fd * 368 
= = 2=6 

Standard Deviation o = “ = (= = 2.587 
Calculation of Standard Deviation by Short-cut 
Method 


In this method the standard deviation is calculated under the 
following steps : 


(i) The deviation ( dx ) of each item in the series from assumed 
mean ( A ) is determined. 

dx = X — A (here plus (+) and minus (—) signs are to be noted) 

(ii) By multiplying deviations with their frequencies, total of the 


products ( 2 fdx ) is determined. 
(iii) We again multiply fdx by dx and find fdx 2 
fax 2 ), 

(iv) We calculate standard deviation with the help of following 
formula : 


{ Side? | pa y 
oO — N N 


and their total ( 2 


IIlustration 5. 
Calculate standard deviation from the following data : 
xX 567891011 12 
f£35812221462 
Solution: 
x f dx ( A = 8) fdx fdx x dx ( fdx 2) 
53-3-927 
65-22-1020 
78-11-88 
812000 
9 22 122 22 
10 14 2 28 56 


116 3 18 54 
1224 8 32 


N =722 fdx =49% fdx 2 =219 


= om = 1.61 
Calculation of Standard Deviation by Square of 
Values Method 


The standard deviation in this method is calculated in the following 
ways : 

(i) By multiplying each them value with its frequency, the sum of 
products ( 2 fX ) is determined. 

(ii) How each product ( f& ) is again multiplied by X and total of 


products ( 2 Fx 2 ) is determined. 
(iii) Standard deviation is calculated by using the following formula 


o=\* 0) where N= f 
(C) Calculation of Standard Deviation in 
Continuous Series 
Calculation of standard deviation in continuous series is done in 
the same way as it is done in discrete case. Here mid value of each 
class is taken. It is assumed as X , in continuous series, the 
standard deviation can be also calculated by all aforesaid methods. 
Illustration 6. (By Direct Method) 
Find second moment of dispersion from the following series : 
Class 0-10 10-20 20-30 30-40 40-50 50-60 
Frequency 6791684 
Solution : 
Class Mid value ( X ) Frequency (f) f£X X—«(d)d? fd? 
0-10 5 6 30 — 25 625 3750 
10-20 15 7 105 — 15 225 1575 
20-30 25 9 225 — 5 25 225 
30—40 35 16 560 5 25 400 


40-50 45 8 360 15 225 1800 
50-60 55 4 220 25 625 2500 


N = 501500 5 fd 2 = 10250 


= Y¥-5o = 30 


Standard Deviation O = \s = \ 50 = Wm = 14,32 


Illustration 7. (By Short-cut Method) 
Find standard deviation and its coefficient from the following data : 
Age in years 0-10 10—20 20-30 30-40 40-50 50-60 
No. of Persons 8 1215753 
Solution: 
Age Mid value No. of Deviation from ( f.dx ) fdx x dx 
(in years) ( X ) Persons assumed mean ( fdx 2 ) 
(f)( dx )(A = 30) 
0-10 5 8 — 25 — 200 5000 
10-20 15 12 — 15 — 180 2700 
20-30 25 15-—5-— 75 375 
30—40 35 7 5 35175 
40-50 45515 75 1125 
50-60 55 3 25 75 1875 


N = 50—270 11250 
> fax fdx 2 


Standard Deviation o = 


= 13.99 years 

Arithmetic Mean =A +“ 

= 30 + “ = 24.6 years 

Coefficient Standard Deviation = «~ 2° = 0.569 


Illustration 8. (By Step Deviation Method) 
Calculate coefficient of standard deviation by step-deviation method from the 
following data : 


Class 50-60 60-70 70-80 80-90 90-100 
Frequency 5 8 32 12 3 
Solution: 
Class Mid value f dx ( A = 75) dx' ( i = 10) fdx' fdx' 2 


x 


50-60 55 5 — 20 — 2-10 20 
60-70 65 8-10-—1-88 
70-80 75 320000 

80-90 85 12 10 1 12 12 
90—100 95 3 20 2 60 12 


N =600 52 
> fdx = fax 2 


Standard Deviation o = ts cer ale a 

o = ls-lh) | x 10 

= wa X 10 = 0.9309 x 10 = 9.309 

Coefficient of S.D. = * 

x =A + 2 x / 

=75+%0x10=75 

Hence coefficient of S.D. = “> = 0.124 
Illustration 9. 


Find arithmetic mean and standard deviation for the following frequency 
distribution. Also calculate coefficient of standard deviation : 


Class interval 0—5 5—10 10-15 15-20 20-25 

Frequency 281532 
Solution: 

Here / = magnitude of class interval = 5. 
Class Interval Frequency Mid value ( X ) dx' fdx' fdx' 2 
0-5 22.5-2-48 
5-1087.5-1-88 
10-15 1512.5=A000 


15-20 317.133 
20-25 2 225248 


Total 30 —5 27 
Arithmetic Mean = =A+ 


fil! 


Zfis _ (Efe ; 
N x | 


Standard Deviation (0 ) = ' ¥ 


ee _iesy 


= a0 ~ (30) x5 = 50.9 (3) 
= 5V0.9- 0.0278 = 5V0.8722 = 5 x 0.9339 = 4.67 
Coefficient of S.D. = « 


= te = 0.40 
Note : If we have to find coefficient of variation, then 
CV =a 


= tw x 100 = 40% 
Calculation of Standard Deviation in Continuous 


Series by Summation Method 

This method can be only used in continuous series when widths of 
class interval are same in the series. In this method the calculation 
work is done as follows : 


(i) We find cumulative frequency of the frequency in given series. It 
is called first cumulative frequency ( c.f. ) 

(ii) We find again cumulative frequency for the second time from 
cumulative frequencies. It is called second cumulative frequency ( 
cf5 ). 

(iii) By dividing the total of first cumulative frequency by total 
frequency (2 f ), dividend F 4 is determined. 

F4= a 

(iv) By dividing the total of second cumulative frequency ( cf 9 ) by 
total frequency ( 2 f ), dividends F 9 is determind. 

Fo=3 

(v) To calculate standard deviation the following formula is used : 


O== ( 2-4 - Hm?) x j 


Here, / = width of class interval which should be the same for all 
classes in the series 
me Mea 


Illustration 10. 
Calculate standard deviation by summation method for the following data : 


Age in years 10—20 20-30 30—40 40—50 50-60 60-70 70-80 
Frequency 2448632 

Solution: 

Age (in years) fcf 4 cf9 

10-20 2 22 

20-30 468 

30-40 4 10 18 

40-50 8 18 36 

50-60 6 24 60 


60-70 3 27 87 
70-80 2 29 116 


N = 29 116 327 
2 cf42 cfo 


F4= 8 =4, Fo = V8 = 11.28 


Oo = x | 


buis-4-£ X 10 = vass-7—7 x 10 


= 2s X 10=1.6 x 10 = 16 years 
Comparison of Two or More Series : Coefficient of Variation 

This measure had been used for the first time by Karl Pearson 
hence it is also called Karl Pearson’s coefficient of variation. Since 
standard deviation is an absolute measure hence series may not be 
compared by it. Coefficient of variation is determined to compare the 
series with its help the variability, stability or consistency between 
two or more series is compared. The series, which has more 
coefficient of variation, is more variable. In other words the series, 


which has less coefficient of variation, is more stable or consistent. 
The following is the formula to determine it : 


Coefficient of Variation ( C.V. ) = x x 100 


Illustration 11. 
Calculate standard deviation and coefficient of variation from the following data : 


Class 0-10 10-20 20-30 30-40 40—50 50-60 60-70 70-80 
Frequency 12 18 35 42 50 45 20 8 


Solution : 


Class Mid value Frequency A = 35 fdx fdx 2 


X f dx 

0-10 5 12 — 30 — 360 10800 
10—20 15 18 — 20 — 360 7200 
20-30 25 35 — 10 — 350 3500 
30-40 35 42000 

40-50 45 50 10 500 5000 
50-60 55 45 20 900 18000 
60—70 65 20 30 600 18000 
70-80 75 8 40 320 12000 


N = 230 fdx = 1250 5 fdx 2 = 75300 


75300 _ 
230 


Oo = 
= w7a-9e = wre = 17.258 
=A+ 4 =35+ & = 40.435 
Coefficient of Variation C.V. = x x 100 
= 1.5 X 100 = 42.68% 
Illustration 12. 


Following are the scores made by two batsmen Ayush and Piyush in 10 series 
of innings. Who is the better run getter on the average ? Who is more consistent ? 


Ayush 12 115 6 73 7 19 119 36 84 29 


Piyush 47 12 76 42 4 51 37 48 130 
Solution : 


Who is the better run getter on the average, to know it the means 
of two series will be calculated and to see the consistency coefficient 
of variation will be calculated. 

Ayush Piyush 

( A = 50) (A = 33) 

x dx dx 2 Ydy dy 2 

12 — 38 1444 47 + 14 196 
115 + 65 4225 12 — 12 441 
6 — 44 1936 76 + 43 1849 
73 + 23 529 42 + 9 81 

7 — 43 1849 4 - 29 841 
19 — 31 961 51 + 18 324 
119 + 69 4761 37 +4 16 
36 + 14 196 48 + 15 225 
84 + 34 1156 13 — 20 400 
29 — 21 441 0-33 1089 


> dx =0 dx * = 174982 dy=0 dy 2 = 5462 
Calculation of Mean 
of Ayush=A+ * Piyush =A+ % 
= 50+ » = 50 runs = 33 + » = 33 runs 
So Ayush is better run getter on the average. 


Calculation of coefficient of variation 

Ayush Piyush a 

o = 10 -l0) G = {10 (0) 

= 41.83 = 23.37 

C.V. = “> x 100 C.V. = “» x 100 

= 83.66% = 70.818% 

Sine C.V. of Ayush is more hence he is more inconsistent. In other 
words Piyush is more consistent in scoring the runs than that of 
Ayush. 


Illustration 13. 
From the following data, state which series is more variable : 


Class 10-12 12-14 14-16 16-18 18—20 20-22 
Series A 15 22 28 40 132 
Series B 12 38 20 15123 


Solution : 
Calculation of mean and coefficient of variation 
For series A For series B 


Class Mid Frequency A = 15 A = 15 


(Class) value X f dx fdx fdx 2 f dx fax fdx 2 
10-12 1115-460 240 12-4 — 48 192 
12-14 13 222-44 88 38 2-76 152 
14-16 15 2800020000 

16-18 17 40 2 80 160 15 2 3060 

18-20 19 13 4 52 208 12 4 48 192 

20-22 212612 7236 18 108 


N =120 fox fdx 2 N=* fdx fax 2 
= 40 = 768 100 = —28 = 704 
In series B In series B 


Sifax > 
N 


=At+* =15+3 =15.33 =At+* =15+ @=15-0.28 


2 PRE) 


2 


1704 | (—28 
O = {100 * x00) 


oO — 
= ise - (8) = 7.04 —0.0784 

= wn-009 = 2,508 = wan = 2.638 

C.V. = * x 100 C.V. = * x 100 

a X 100 = 16.36% = un = 17.92% 

Since series B has more coefficient of variation hence there is 
more variation in the data of series B. 


Illustration 14. 


Following are the marks of 8 students in two subjects Maths and English. Find 
out in which subject the students are more consistent : 


Maths 30 38 75 60 45 55 45 50 
English 40 70 65 50 45 53 57 60 


Solution: 
For it, separate coefficient of variation for both series will be 
calculated. 


Maths ( X ) 30 38 75 60 45 55 45 50 Total 
d=(xX -—50)- 20-12 2510-55-50-2 
dx * 400 144 625 100 25 25 250 1344 


J 


Arithmetic Mean = A+ * =50+ 
= 50 —0.25 = 49.75 


ee ee (Bd fish 
Standard Deviation(o )=V" 0x) = Vs ts 
= is-000 = Vere = 12.96 
. . Standard Deviation 
Coeff. of variation for Maths = (C.V.) Maths = mtirticdesian 20" 
= ss xX 100 = 26.05 


English ( X ) 40 70 65 50 45 53 57 60 Total 

dx = X —55-15 15 10-5-10-2250 

dx 2 225 225 100 25 100 4 4 25 708 
Arithmetic Mean = A + * =55+ 5 =55 


Coeff. of variation for English = (C.V.) Eng 
(C.V.) Eng < (C.V-) Maths 


Students are more consistent in English in the comparison of 
Maths. 
Relation between Different Measure of Dispersion 

There is the following relation between different measures of 
dispersion in normal, symmetrical or moderately asymmetrical 
distribution : 


1. Relation between mean deviation (M.D.) and standard 
deviation (S.D. oro ): 


Mean deviation (M.D.) = » o = 0.7979 o 
or Standard Deviation o = + M.D. 
2. Relation between Quartile Deviation and Standard 
Deviation : 
Quartile Deviation (Q.D.) = » o = 0.67450 
or Standard Deviation o = : Q.D. 
3. Relation between Quartile Deviation and Mean Deviation : 
Quartile Deviation (Q.D.) = « M.D. 
or Mean Deviation M.D. = > Q.D. 
4. Relation among Quartile Deviation, Mean Deviation and 


Standard Deviation : 
6th times of standard deviation = 7.5th times of Mean Deviation 
= 9th times of Quartile Deviation 


Le. ,60=7.5M.D. =9 Q.D. 


Illustration 15. 

If mean deviation of a certain series is 8.36, find the probable value of standard 
deviation and quartile deviation. 
Solution: 


Standard Deviation = ‘ X Mean Deviation 
=: x 8.36 = 10.45 
Formula : Quartile Deviation Q@.D = « x Mean Deviation 
= 5 x 8.36 = 6.97 (Approx.) 
Illustration 16. 
IfQ 4 =22 and Q 3 = 32, find standard deviation. 


Solution: 
Quartile Deviation Q.D, = “2s” = "3" =5 
Now, Standard Deviation = > Q.D. => x 5=7.5 


Illustration 17. 
If Q 4 =48 and Q 3 = 60, find mean deviation. 


Solution : 

Quartile Deviation Q.D. = “2" = "2" =6 
Now, M.D. = 5 Q.D. 

M.D. =5x6=5 =7.2 

Illustration 18. 


Bhilai Steel Plant decided to give old age pension to people over 60 years of 
age in the following manner : 
Age group 60-65 65-70 70-75 75-80 80-85 
Pension per month ( ~ ) 20 25 30 35 40 
The age of 25 persons who secured the pension right are given below : 
64, 62, 84, 72, 61, 83, 72, 81, 64, 71, 63, 61, 60, 67, 74, 66, 74, 79, 73, 75, 76, 
69, 68, 78, 67 
Calculate total person payable per month, average monthly pension and 
standard deviation of 
pension. 
Solution : 
Prepare the frequency distribution according to age group of 25 
persons : 


Age Tally No. of Pension per 


group Marks Persons ( f ) month ( x ) fx =~“ fds fds 2 
60-65 |||| || 7 20 140 — 2 — 14 28 

65-70 |||| 525 125-1-55 

70-75 |||| | 6 30 180000 

75-80 |||| 435 140144 

80-85 ||| 3 40 120 2 6 12 


Total 25 705 —9 49 
Total given pension for month = 2 fX =~ 705. for N =25 x 28.2 
= 705.0] 
Average monthly pension =A + ¥* 
=30+ = x 5=30-1.8=° 28.2. 


fy, fds” (2éze)" see 


Standard Deviation o = Vt 


49 /(-9 
25 \25 


= (2-3) x 5 


1296 X 5 = 18304 X 5 


= 1.353 x 5=° 6.765. 

Merits of Standard Deviation 

Standard deviation has the following merits : 

(1) Based on all values : This measure is determined on the 
basis of all item values in the series. 

(2) Less effect of fluctuations of the sampling : Standard 
deviation is less affected by fluctuations of sampling than other 
measures of dispersion. 

(3) Clear and definite measure : It is the most clear and definite 
measure among all the measures of dispersion. 

(4) Capable of Algebraic treatment : Standard deviation is 
capable of further algebraic treatment. It is used in higher statistical 
measures. 

(5) Use in higher mathematical study : It is the most accurate 
measure of dispersion. So it is used in higher mathematical study. 
Demerits of Standard Deviation 

(1) Difficult calculation : Its calculation method is rather hard 
than other measures of dispersion. 

(2) More effect of extreme values : It is determined with the help 


of arithmetic mean so it is more affected by extreme values. 
(3) Difficult to understand : Due to the complication in calculation it 
is not easily understood by common man. 


Utility of Standard Deviation 


As_ arithmetic mean is mostly used in measures of central 
tendency in the same way standard deviation is mostly used in the 
measures of dispersion. Standard deviation is used in the calculation 
of correlation, regression etc. in the statistics. 


Important Formulae 
Measure of Individual series Discrete series Continuous series 
Dispersion 
1.Range R=M4-MgR=M4-MjR=M4-Mg 
Coeff. of Range CR = mm CR= mm CR = mm 
2. Inter Quartile Range QR =Q 3-Q4IQR=Q3-Q 1 lIQR= 
Q3-Q4 
Coeff. C.1.Q.R. = &% CIQ.R. = e% CLQ.R. = ara 
3. Percentile Range PR = Pgg—-P 49g PR=P99-P49 PR=P 


90-P 10 

4. Quartile deviation Q.D. = “=" Q.D. = “=" Q.D. = “=* 
Coefficient C.Q.D. = @=0 C.Q.D. = @=% C.Q.D, = a0 
5. Mean Deviation 


op eX] _ lex]. _ Sle 
From mean *-"w” ®=>y = ow 
From median °= ox) = ty on = 

: ydZ| EA Yf\dZ| 2 Sylaz| 
From mode b= b= St oe = 


Plus (+) and minus (—) signs are left out while taking the deviation 
about mean, median or mode, that is, all considered as positive. 
6. Coefficient of M.D. 


From mean “> x °° = Se =F 


OM OM oM 


From median coM = Te Co 9OM = Se Co OM = FP 


Oz 


From mode °” = % CodZ = % coor = % 
Short-cut Method 


From median °°! 


| dig: | +(M -— ANp - Na) 


Second short-cut * >" 1"s a 
method 


7. Standard Deviation 


Short-cut method ’ 


Sd? [spe [s pa? 


Direct method o= 2 = (2H a= Rh 
2 


8. Variance = o 
9. Coefficient of Variation C.V.= x 100 
Exercise 8 (C) 
1. Find the standard deviation and its coefficient by square of value method or 
short-cut method : 
X = 59, 48, 65, 57, 31, 60, 37, 48, 78, 43 
[ Ans. : 0 = 13.26] 
2. The table below gives the marks obtained by B. Com. students with Roll No. 1 
to 10 at an examination. Calculate standard deviation : 
Roll No 12345678910 
Marks 43 48 65 57 31 60 37 48 78 59 
[ Ans. : 0 = 13.26 marks] 
3. Find standard deviation from the following data : 
x67891011 12 
f£36913854 
[ Ans. : 0 = 1.607] 


4. Find the mean deviation and standard deviation from the following : 


S.No.1234 
Marks 5 7 9 11 


[ Ans. : o = 2.236, * = 2] 

5. Sales for five years are given below. Find the coefficient of variation : 

Sales (in '000 * ) 230 390 582 799 1035 

[Ans.: =~ 607.2, 0 =° 286.39, C.V, = ‘**""! = 47.15%] 

6. Find out mean, standard deviation and its coefficient of the following frequency 


distribution : 


No. of Accidents 01234567 89101112 
Persons Involved 1616 21101684212202 


[ Ans.: =3, 0 =2.65, Coefficient of S.D. = 0.883] 


7. Find the mean deviation from mean and standard deviation of examination 
marks of 75 students : 


Marks 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 
No. of Students 247538675984322 


[ Ans.: =53.67 marks, ** = 15.29 marks; o = 18.06 marks] 

8. Find the standard deviation and coefficient of variation of the following numbers 
by grouping the numbers in class intervals of 10 : 

AO, 43, 43, 46, 46, 46, 54, 56, 59, 62, 64, 64, 66, 66, 67, 67, 68, 68, 69, 69, 69, 71, 
75, 75, 76, 76, 78, 80, 82, 82, 82, 82, 82, 83, 84, 86, 88, 90, 90, 91, 91, 92, 95, 
102, 127 

[ Ans.: = 73.67, 0 = 17.96, C.V. = 24.38%] 

9. The following is the distribution of 100 members of parliament find the standard 
deviation and its coefficient : 

Age (in yrs.) 30-40 40-50 50-60 60—70 70-80 80-90 
No. of Members 3 51 25597 
[ Ans.: =53.7 years, 0 = 12.8 years, C.V. = 24%] (Sagar 2005) 

10. Calculate the arithmetic mean, median, mode and coefficient of variation of the 

frequency distribution given below : 
Class Intervals 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 
Frequencies 1257107521 
[Ans.: =27.5,2Z= 27.5, M = 27.5, C.V. = 31.74%] 
11. Find standard deviation and its coefficient from the following data : 
Age (in years) 0-10 10—20 20-30 30-40 40-50 50-60 60-70 70-80 
No. of persons 15 15 23 22 25 10 5 10 


[ Ans.: = 35.16 yrs, 0 = 19.76 yrs, C.V. = 56%] 
12. The classification of 58 passengers of a bus in their age groups is given below. 
You are required to calculate the standard deviation of the ages and its 
coefficient : 
Age (in years) 10—20 20-30 30-40 40-50 50-60 60-70 70-80 
Frequency 488161264 
[ Ans.: =45 years, o = 16 years, coefficient of S.D. = 0.356] 
13. Calculate standard deviation from the following data : 
Temperature — 40° to —30° —30° to —20° —20° to —10° —10° to 0° 0° to 10° 10° 
to 20° 20° to 30° 
No. of days 10 28 30 42 65 180 10 


[ Ans. : o = 14.75°] (Indore 2003, 06; Ravishankar 2004) 
14. Calculate standard deviation and coefficient of variation from the following 
data : 
Value (more than) 70 60 50 40 30 20 
Frequency 7 18 40 40 63 65 
[ Ans.: =50.85, o = 14.56, C.V. = 28.6%] 
15. Calculate the standard deviation and its coefficient of the following series : 
Marks (more than) 0 10 20 30 40 50 60 70 
No. of Students 100 90 75 50 20 1050 
[ Ans.: 0 =15 marks, = 30 marks, coeff. of S.D. = 0.5] (Bilaspur 2006) 
Relation between Various Measures of Dispersion 
16. In a moderately asymetrical distribution the mean deviation is 4. Estimate the 
Quartile deviation and the Standard deviation. 
[ Ans. : Q.D. = 3.33, o = 5] 


17. If the standard deviation of a frequency distribution is 30, estimate the mean 
deviation and the quartile deviation. 

[ Ans. : 6 = 24, Q.D. = 20] 

18. The mean and standard deviation of a normal distribution were 60 and 5 
respectively. Find the inter—quartile ranges the mean deviation and the 
coefficient of variation. 

[ Ans. : 6 =4, 1.Q.R. =6.67, C.V. = 8.33%] 

19. Find out the coefficient of variation if : 


(a) o =3.5, N=10, 2 X = 145 
(b) Variance = 148.6, Mean = 40. 


[ Ans. : (a) C.V. = 24.14%, (b) C.V. = 30.47%] 
20. Calculate the coefficient of variation of a series on the basis of the following 
results : 


2-400 


N =50, 2 X =-10,2 X 
where X = deviation from the assumed mean 7.5 and N = number of items. 
[ Ans.: C.V. = 38%] 
21. Mean of 100 items is 50 and their S.D. is 4. Find the sum of all the items and 

also the sum of the squares of the items. 
[ Ans. : © X =5,000, 5 X * =2,51,600] 
Comparison of Two or More Series 

22. A sample of 5 items was taken from the production in a certain establishment. 


The length and the weight of those 5 items are as follows : 


Length (in inches) 3 4 6 7 10 


Weight (in kg) 9 11 14 15 16 
Find the coefficient of variation of the two and say which of them is more variable ? 


[ Ans. : C.V. for length = 40.8%, C.V. for weight = 20%, Length is more variable.] 
23. From the prices of shares of Gourav Ltd. and Ishan Ltd. given below, state 


which is more stable in 


value : 
Prices of Shares of Gourav Ltd. 55 54 52 53 56 58 52 50 51 49 


Prices of Shares of Ishan Ltd. 108 107 105 105 106 107 104 103 104 101 
(Jabalpur 2006) 


[ Ans. : C.V. of Gourav Ltd. 4.99%, C.V. of Ishan Ltd. = 1.90%, Ishan Ltd. has 
more stable prices.] 
24. Prices of a particular commodity in five years in two city Indore and Khandwa 
are given below, find the city which had more stable prices : 
Indore 20 22 19 23 16 
Khandwa 10 20 18 12 15 


[ Ans. : C.V. for Indore = 12.25%, C.V. for Khandwa = 24.6, Indore city has more 
stable prices.] 
25. From the following figures of population (in '000) compare the variability : 
Years 1951 1961 1971 1981 1991 2001 2011 
City A 160 175 172 172 157 184 261 
City B 218 223 213 204 198 205 263 


[ Ans. : C.V. of A = 18%, C.V. of B = 9.25%, Population of city A is more 
variable. ] 
26. The following table gives goal scored by two team A and 8B in a football 


season. Find the team which is more consistant in its performance : 
No. of Goals scored No. of football matches played 
Team A Team B 
02717 
199 
286 
Gee gs) 
443 


[ Ans. : C.V. for A = 123.68%, C.V. of B = 109%; Team B is more constant] 
(Bilaspur 2006) 


27. The following table gives the distribution of wages in the two branches of an 


industrial concern : 
No. of Workers 


Monthly Wages ( ° ) Branch A Branch B 
100-150 167 63 

150-200 207 93 

200-250 253 157 

250-300 205 105 

300-350 168 82 


Find out the arithmetic mean and the standard deviation for the two branches 
separately and 


state : 

(i) Which branch pays higher average wage per month ? 

(ii) Which branch has greater variability in wages relative to the average wages ? 
and 

(iii) What is the average monthly wages for the concern as a whole ? 


[Ans.: ,= 225, p= 230, C.V. of A= 29.4%, C.V. of B = 26.96%, 


(i) Branch B pays higher wages. 
(ii) Variability is greater in case of branch A. 


(iii) Average monthly wages for the whole concern = * 226.67] 
28. Data regarding lives of two new models of refrigerators collected in a recent 
survey are : 

Life (in years) 0-2 2-4 4-6 6-8 8-10 10-12 

Model A 51613754 


Model B 27 121991 


What is the average life of each model of these refrigerator ? Which model has 
greater uniformity ? 


[ Ans.: of A =5.12 years, o of A = 2.812 yrs., C.V. of A= 54.92%, of B= 
6.16 years, o of B = 2.23 years, C.V. of B = 36.2%; model B has greater 


uniformity.] 


THEORETICAL QUESTIONS 


Long Answer Questions 

1. Explain the term Dispersion. What are the various methods of measuring 

dispersion ? (Jiwaji 2006) 

2. What is mean deviation ? How is it calculated ? 

3. What is coefficient of variation ? Which purpose does its serve ? 

4. Discuss the relative merits and demerits of the various measure of dispersion. 

5. Discuss the comparative usefulness of the measures of dispersion. 

6. Explain why the standard deviation is regarded as superior to other measures 
of dispersion. What is its chief defect ? 

7. What is meant by dispersion ? What are the various methods of measuring 
dispersion ? Explain any one of them. 

8. What do you mean by standard deviation ? Differentiate between standard 


deviation and mean deviation. (Ravishankar 2006) 
Short Answer Questions 


. Define coefficient of variation. 

. Describe the utility of variance. 

. Define mean deviation. Write its merits and demerits. 
. Define standard deviation. 


. Write merits and demerits of Quartile deviation. 
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. Give the definition of range and comment on its merits and demerits. 


OBJECTIVE QUESTIONS 
State whether following statements are true or false : 
1. Absolute measures of dispersion should be used for comparison of deviations 
in two or more groups. 
2. Mean deviation is an absolute measure. 
3. Quartile deviation is half the distance along the scale between the first and the 


third quartile. 


. There is no relation between averages and dispersions. 
. Coefficient of variation is the absolute measure. 

. The standard deviation of 20, 20, 20, 20 and 20 is zero. 
. The range of 0, — 2, 3, 5 and —6 is 11. 


. Mean deviation is calculated from the mean alone. 


o ON OO a fF 


. The mean deviation about mean is 4/5 of standard deviation. 
10. Quartile deviation is 2/3 of standard deviation. 
[ Ans. 1. False, 2. True, 3. True, 4. False, 5. False, 6. True, 7. True, 8. 


False, 9. True, 10. True A ] 
Choose the correct answers 


1. The measure based on all the values of variable is : 

(a) Range (b) Standard deviation 

(c) Quartile deviation (d) None of these 

2. In the case of open end class interval the suitable measure of dispersion is : 
(a) Mean (b) Standard deviation 

(c) Quartile deviation (d) None of these 

3. If N =9 and variance = 169 standard deviation will be : 

(a) 13 (b) 13/2 

(c) 169/9 (d) 13/3 

4. lf N=10,2 X =60,2 X 2 = 1,000, then standard deviation will be : 

(a) 8 (b) 12 

(c) 6 (d) 100 

5. Ifo =16,2(X-X) 
(a) 16 (b) 256 

(c) 4 (d) None of these 


2. 4,096, then value of rn will be: 


6. The mean and standard deviation of 200 items are 48 and 3. The sum of items 
and the sum of 


squares of the items are respectively : 


(a) 960 or 46,260 (b) 9,600 or 4,62,600 
(c) 1,920 or 46,200 (d) 3,200 or 1,54,200 


7. If standard deviation is 4, No. of item is 10 and sum of items is 160, then 


coefficient of variation will be : 
(a) 16% (b) 25% 
(c) 20% (d) 35% 


8. The mean and coefficient of variation of a distribution is 100 and 35%. Standard 


deviation will be. 
(a) 35 (b) 0.35 
(c) 3.5 (d) 20/7 


9. If Q 4 = 33, Q 3 = 24, then the value of S.D. with the empirical relation will be : 


(a) 4.5 (b) 6.75 
(c) 3 (d) 3.6 


[ Ans. 1. (b), 2. (b), 3. (a), 4. (a), 5. (a). 6. (b), 7. (b), 8. (a), 9. (b).] @@ 


Measures of Dispersion | 


Skewness 
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SKEWNESS 


— ° Test of Skewness 

— ° Measures of Skewness 

— ° Karl Pearson’s Coefficient of Skewness 

— * Bowley’s Coefficient of Skewness 

The lack of symmetry in a distribution is called skewness, that is, 

the tendency of a distribution to deviate away from the symmetry is 
called skewness. We obtain the knowledge about the shape of the 
distribution with its study. In symmetrical distribution, Mean = Median 
= Mode. Frequencies are equally distributed in such type of 
distribution. In such type of distribution the frequency attains the 
maximum value by increasing in a definite and regular order and 
decreases in the same definite and regular order, that is, the items 
equidistant from mean have the same frequency. For example : 


X123456789 


f235787532 

In this distribution, 

Arithmetic mean = ¥- 27° 

Median = 5 and Mode = 5 

There are two types of skewness : 

(1) Positive skewness : In such type of distribution, Arithmetic 
mean > Median > Mode in this situation the curve is more skewed to 
the right. that is, there is a long tail in right side. 

(2) Negative skewness : In such type of distribution. 

Arithmetic mean < Median < Mode 


In this condition, the curve of the distribution is more skewed to the 
left, that is, there is a long tail in left side. 


TEST OF SKEWNESS 


It can be tested on the following basis whether a series is skewed 
or not: 


(1) On the basis of means : If the values of arithmetic mean, 
median and mode are the same then there will be no skewness in 
the series. If all these three means have different values, the 
skewness is present. The more difference among all these three 
reveals the more amount of skewness. 

(2) On the basis of sum of deviations : If the sum of deviations 
(positive and negative) in a series about mean, median or mode is 
zero then there is no skewness. The more is the sum of deviations, 
the more is the skewness. 

(3) On the basis of mode : If the sum of the frequencies on both 
sides of mode in a series is equal then the skewness is absent in 
that series. 

(4) On the basis of graphical representation : If a normal curve 
is obtained by plotting the data of a series on the graph paper and 
the curve, so plotted, is folded from the middle so that half of the 
curve covers rest of the curve perfectly then there will be no 
skewness. 

(5) On the basis of distance from the median : If the first and 
third quartiles (Q 1 and Q 3) are equidistant from the median then 
there will be no skewness in the series. 


MEASURES OF SKEWNESS 


There are the following measures of skewness : 


KARL PEARSON’S COEFFICIENT OF SKEWNESS 


If the arithmetic mean, median and mode in a _ frequency 
distribution are the same then that frequency distribution will be 
symmetrical. Frequency distribution will be skewed by noting the 
divergence between the values of these means. Hence, the 
difference between arithmetic mean and mode may be accepted as 
a measure of skewness. Thus, we can say the greater the absolute 
difference between mean and mode, the more asymmetrical the 
distribution and vice-versa . On the basis of these facts, Karl 
Pearson produced the following formula for the relative measure of 
skewness : 

Skewness, ( Sk )= ¥ 

Coefficient of skewness, ( J ) = Sandor deviation 

J - sat 

Theoretically, there is no limit for this measure but yet it lies 
between — 1 and + 1. 


Note : Sometimes, to find the value of mode is impossible. Then the 
following formula, given by Karl Pearson, may be written in the 
place of mode in the aforesaid formula : 


3(X-M) 


So, coefficient of skewness, (J) = « 
This coefficient of skewness can take any value between — 3 and 
+3; 


Illustration 1. 
From the following data calculate coefficient of skewness : 


Years : 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 
Price Index of Wheat : 83 87 93 104 106 109 118 124 126 130 
Solution : 


S.No. Year XX— «(X—*)? 
1 1950 83 — 25 625 
21951 87 — 21 441 


3 1952 93 — 15 225 
4 1953 104-4 16 
5 1954 106 — 2 4 

6 1955 109 1 1 

7 1956 118 10 100 
8 1957 124 16 256 
9 1958 126 18 324 
10 1959 130 22 484 


Total 2 X = 1,080 — 2,476 


XxX _ 1,080 


Arithmetic mean, * = ‘v ~~" 
Medians M= Vr PP IBS 


n +6 ite _ 10 16 + 10 _ 215 
= 107.5 
2 


Standard deviation, o = 
3(Mean — Median) 


J = Standard deviation 


8(103- 107.5) _ 8x06 
15.73 16.78 


= 0.0954 


Illustration 2. 
From the following data find out the Karl Pearson’s coefficient of skewness : 


Measurement 10 11 12 13 14 15 
Frequency 2410851 
Solution : 
Measurement Frequency Deviation from 
assumed mean d 2 xX Multiply f and dx Multiply fdx and dx 
(xX) (f (X-12)=( dx ) fdx fd * x 
1002-24-48 
1114-11-44 
12100000 
138+11+88 


145+24+ 1020 
151+39+39 
N= 30 fox=+13 fa*x=49 


aie 


Arithmetic mean, * = A+ x -""% =12 + 0.43 = 12.43 


Mode from inspection = 12 


zie (aie Jag. (13)? 
\N} = 30 \30] 


Sk = % —Z = 12.43 —12 = 0.43 
Karl Pearson’s coefficient skewness 


X-Z_ 1248-12 0. 
= = = 0-48 _ 9.3583 
— «@ 1.2 1.2 


Illustration 3. 
Find out Karl Pearson’s coefficient of skewness from the following data : 


Size 7.4 8.4 9.4 10.4 11.4 12.4 13.4 
Frequency 26 20 14864 
Solution : 


Size ( x ) Frequency (f) x— 10.4, dx fdx fd x 
742-3 =618 

846—-2-1224 

9.420—1-2020 

10.414000 

1148188 

124621224 

13.443 1236 


Total N = 60 —= fax=-6 fd* x= 130 
Mean, t= 4+ 0442 
= 10.4-0.1=10.3 


Mode by inspection, Z = 9. 


5 2 
Zfd-x {Xfdx\ 


Standard deviation, o = 


= {a-(s) 


= 217-001-216 = 1 47 
Karl Pearson’s coefficient of skewness 
X-Z_ 103-94 0.9 


J — a 147 47 = 0.61 
Illustration 4. 


Calculate skewness and Karl Pearson’s coefficient of skewness from the table 
given below : 


Life (in hours) 80—160 160—240 240—320 320—400 
No. of Tubes 24 90 45 12 
Life (in hours) 400—480 480—560 560—640 640—720 
No. of Tubes 30 120 39 30 

Solution : 

Class Frequency (f ) Mid-value (x)d = '»" fdgfd*. 

80—160 24 120 — 3-72 216 

160—240 90 200 — 2 — 180 360 

240—320 45 280 — 1-45 45 

320—400 12 360000 

400—480 30 440 1 30 30 

480—560 120 520 2 240 480 


560—640 39 600 3 117 351 
640—/720 30 680 4 120 480 


Total 390 — 210 1,962 


Arithmetic mean * = A + »*' = 360+ iw x 80 
= 360 + 43.1 = 403.1 hours 
Mode : Maximum frequency f 4 = 120, so modal class = 480— 


560 A 
Formula: Z=L 4 + Wop 


-1y) 


120 - 30 


= 480 + 2x120-30-39 (560 _— 480) 
= 480 + rr fas = 480 + a 
= 480 + 42.1 = 522.1 hour 


{1,962 (210)" 


= | 390 - (350) aD 


= wa X 80 = 2.18 x 80 = 174.4 hour 


Skewness, Sk = *-2 = 403.1 — 522.1 =-— 119 
Karl Pearson’s coefficient of skewness 


J a = Z_ 4081-52 ou Ss 0.68 
Illustration 5. 
Calculate Karl Pearson’s coefficient of skewness from the following data : 
(Vikram 2004; Bilaspur 2009) 


Marks (above) 0 10 20 30 40 50 60 70 80 
No. of Students 150 140 100 80 80 70 30 14 0 
Solution : 


Convert cumulative frequency distribution into simple frequency 
distribution : 


Marks 0—10 10—20 20—30 30—40 40—50 50—60 
Students 10 40 20 0 10 40 


Marks 60—/70 70—80 80 and above 

Students 16 140 
Marks Frequency Mid-value Cum. frequency ( X — 35) d 2x fdx 
fd 2 x 
(f) (M.V.) (X) (c.f) dx 


O—10 10 5 10 — 30 900 — 300 9,000 
10—20 40 15 50 — 20 400 — 800 16,000 
10—30 20 25 70 — 10 100 — 200 2,000 
30—40 0 35 700000 

40—50 10 45 80 + 10 100 + 100 1,000 


50—60 40 55 120 + 20 400 + 800 16,000 
60—70 16 65 136 + 30 900 + 480 14,400 
70—80 14 75 150 + 40 1,600 + 560 22,400 
80 and 0 85 150 + 50 2,500 0 0 

above 


N = 150% fdx= fa? x 

+ 640 = 80,800 
Arithmetic mean, * = A+ * =35+ inv = 35 + 4.27 = 39.27 Marks 
Median, M= 2) = \2!) th term = 75th term, it is in the class 40-50 


Formula: M=/4+ » (m-—c) 


50 — 40 


> M=40+  (75-—70)=40+ » x5=40+5 = 45 Marks 
Since frequencies are not systematic so mode is not clear. 


Sfd?x _(Bfdx\? (ese _(s40) 


Standard deviation, o = | nN (Nw) = ¥ 150 (150) 


= oer 18233 - 0-87 = DD 81 Marks 
3(X — M) 


Coefficient of skewness, J= » 


- 0.754 


3(89.27 - 45) _3x-5.73 17.19 _ 
= 2281 281 2281 


BOWLEY’S COEFFICIENT OF SKEWNESS 


We know that first and third quartiles are equidistant from median 
in a symmetrical distribution. Hence both quartiles will not be 
equidistant from median in asymmetrical distribution. If there is 
positive skewness in the distribution then first quartile (Q 4 ) will be 
close to the median and third quartile ( Q 3 ) will be away from 
median. Contrary to it, in the case of negative skewness the first 
quartile (Q 4 ) will be away from median and third quartile (Q 3 ) 
will be close to the median. Thus, the difference between the 
distances of quartiles from median is used to measure the 
skewness. Dr. Bowley produced the formula based on quartiles to 


determine the skewness. 
Bowley’s coefficient of skewness, 


JQ= Comme See 
Bowley’s coefficient of skewness can take any value between — 1 
and + 1. But 0.1 reveals moderate skewness and 0.3 reveals marked 
skewness. 
Illustration 6. 
Calculate Bowley’s coefficient of skewness for the following data : 


3 students get 3 marks each 5 students get 5 marks each 
8 students get 7 marks each 6 students get 8 marks each 


2 students get 10 marks each 


Solution : 
Solve after preparing in the form of frequency distribution : 


Marks Frequency ( f ) Cumulative frequency ( c.f. ) 


Total 24 — 
N+l_M+l_ 235 _. 

q 4 =a ee 
ANAT) _ 22441) _ 60 _ 


q2- 4 4 4 


Q 4=4q 4 thterm = 6.25th term = 5 
Q 9 = M=q 9 thterm = 12.5th term = 7 


Q 3 = q3 thterm = 18.75th term = 8 
Bowley’s coefficient of skewness, 
J Q — % a = 


12.6 


18.75 


13-1 


= Go» =a -s7-° (Negative skewness) 
Illustration 7. 
Calculate Bowley’s coefficient of skewness from the following data : 
Class 15—25 25-35 35-45 45-55 55-65 65-75 
Frequency 1371153 
(Bilaspur 2005) 
Solution : 


Class Frequency ( f ) Cumulative frequency (c.f. ) 
15—25 1 1 

25—35 3 4 

35—45 7 11 


45—55 11 22 


55—65 5 27 

65—75 3 30 

Total 30 — 

q4= poets 

qgza 

q3 = te 

First Quartile, Q 4=L4+ 7% 
= 35 4 8 = 35 4 Seems 00 


Second Quartile, Q9=L4+ 7 * 


= 45 + ai x10 


4x1 


= 45 + Kt = 45+ 3.04 = 48.64 


Third Quartile, Q3=L4+ 7 * 


22.5 — 22 


—_ 55 + ——_—— « 10 
= do 
0.5 ) 
=55 + SE Pr 
_— 5 
Q3 + Q1 — 2Qs 
J Q = 93 - 
56+ 40 - 2 x 48.64 96-97.28 -1.28 © 678 
— 56 — 40 — 16 4G. 


Illustration 8. 
Calculate coefficient of skewness using quartiles : 


Mid-values 15 20 25 30 35 40 
Frequency 30 28 25 24 10 21 


Solution : 
Here, Frequency = 25 — 20 = 20-15 = =5 


.. First Calss = (5-3)-(5+3) Le. , 12.5 —17.5 
Similarly prepare frequency distribution by taking other classes : 
Class Frequency ( f ) Cumulative frequency ( c.f. ) 
12.5—17.5 30 30 

17.5—22.5 28 58 


22.5—27.5 25 83 

27.5—32.5 24 107 
32.5—37.5 10 117 
37.5—42.5 21 138 


Total 138 

q4= Boe 

qo = Bw 

94 = tion 

Qy=lLytr* 

S175 te ie 

Qog=Lyt+ 7" 

=225+ 5s x5=225+ » =225+2.2=247 
Q3=Ly4+ 7" 


103.5 — 83 20.5 


=27.5+-0 *5=27.5+ a 
= 275+4.3 = 31.8 


Illustration 9. 
Calculate Bowley’s coefficient of skewness from the following data : 
Life (in months) No. of Bulbs 
Less than 87.5 35 


Less than 112.5 75 
Less than 137.5 123 
Less than 162.5 223 
Less than 187.5 348 
Less than 212.5 428 


Less than 237.5 478 
Less than 262.5 500 


Solution : 


Class Frequency ( f ) Cumulative frequency ( c.f. ) 
62.5—87.5 35 35 

87.5—112.5 40 75 

112.5—137.5 48 123 

137.5—162.5 100 223 

162.5—187.5 125 348 

187.5—212.5 80 428 

212.5—237.5 50 478 

237.5—262.5 22 500 


Total 500 — 
q4= W556 


¥  2x6500 


M=qo= 404 = 250 


BN _3x500_ 


Gs:= a7 4 
Q = L 1 + aE 

=137.5+ “~** 

= 137.5+ w =137.5+0.5 = 138 

Q 3 = L 4+ sr 

=1875+ 0 ** 

=187.5+ » = 187.5 + 8.44 = 195.94 
M=L4+ 7 
=162.5+ om ** 

=162.5+  =162.5+5.4=167.9 


JQ = Qs - 


195.94 + 138 — 2 x 167.9 
— 195.94 — 138 


375 


333.04 — 335.8 —-1.86 


= 5704 57.04 0.03 
Comparison of Two Distributions 


Illustration 10. 
Calculate Bowley’s Coefficient of Skewness for the two series and point out 
which series is more skewed ? 


Series A : 30 40 33 28 35 31 36 41 45 30 34 
Series B :58 12 17 19 20 26 30 36 43 49 


Solution : 


Series A : Assending order, 
28, 30, 31, 33, 33, 34, 35, 36, 40, 41, 45 


N+1 


First Quartile, Q 4 = = thterm 


= “> th term = third term = 31 


Median, M= >> th term 
=~: thterm = 6th term = 34 


3(N + 1) 3111+ 1) 


Third Quartile, Q3= + =~ « thterm = 9th term = 40 
Bowley’s coefficient of skewness, 


Qs + Q - 2M 


J Q = Q3 - @% 


3 eons 34 nas : 03 _...(i) 
Series B : Ascending order, 
5, 8, 12, 17, 19, 20, 26, 30, 36, 43, 49 


N+1 


First Quartile, Q 4 = = thterm 


="; th term = Third term = 12 


é 2(N +1) 
Median, M= ~: thterm 
2(11 + 1) 


= ~~ thterm = 6th term = 20 


Third Quartile, Q 3 = = thterm 


3111 + 1) 


=a thterm = 9th term = 36 
Bowley’s coefficient of skewness, 


Qs + Q - 2M 


364+12-2x20 48-40 8 
= 36-12 24 24 


Since coefficient of skewness for both the series is positive and 
the same hence there is positive skewness in the both series. 


Illustration 11. 


=0.8 


Find Karl Pearson’s coefficient of skewness for the two series and point out 
which one is more 
skewed : 


Age (in years) No. of Children 


School A School B 
631 

8910 

9159 

1087 

1153 


Total 40 30 
Solution : 


{ 9 vears, for school A 


Mode by inspection ek for school B 
Calculation of Mode and Standard Deviation for School A 


XFIKX—-X(X-*)2F(xK-«)2 
6318-3927 

8972-119 

915135000 

10880118 

115552420 


Total 40 360 — — 64 
xy UX _ 360 


= 9 years 
N AO : 


o= POEAN «f= 1.26 yoars 
Karl Pearson’s coefficient of skewness, 
ja er (1) 
= No skewness 
Calculation of Mode and Standard Deviation for School B 


XFIKX—X wx f(X—2)? 
616-399 
81080-1110 
9981000 


10770117 
1133324 12 


Total 30 270 — — 38 


= oe op Toes years 
Karl Pearson’s coefficient of skewness, 
fey nee :2(2) 
= Negative skewness 
It is clear from equation (1) and (2) that there is more skewness in 
the income distribution of school B . 


MISCELLANEOUS ILLUSTRATIONS 


IHlustration 12. 


If coefficient of skewness is 0.6, the sum of two quartiles is 100 and median is 
38, find the quartile deviation coefficient. (Bhopal 2005) 
Solution : 


Qs + Q, - 2M 


Coefficient of skewness, JQ = @-@ 


4 100 — 2 x 38 
Putting the known values, 0.6 = @-@ ° 
Q3-Q4= 08 “0 
Q,-Q@ 40 2 


=——===04 


.. Coefficient of Quartile deviation = @+@ is 
Illustration 13. 


If coefficient of skewness = — 0.4, mean = 45 and median = 48, find the 
standard deviation and coefficient of standard deviation. (Vikram 2005) 
Solution : 


3(Mean — Median) 


Coefficient of skewness, J = Stncrd deviation 
Putting the known values, 


3(45 — 48) 


> — 0.4 = Standard deviation 


9 


re S 3-3) 9 22.5 
= Standard deviation = -041 01° 
. . 5 . Oo 22.5 = 
Coefficient of standard deviation, = x «= °* or 50% 


Illustration 14. 


If Bowley’s coefficient of skewness is — 0.36, Q 4 = 8.6 and M = 12.3, then find 
Q 3 and quartile coefficient of dispersion. 


Solution : 
Q3 + Q, - 2M 


Coefficient of skewness, JQ = ®-@ 
Putting the known values, 

=> -0.36= a= 

=> — 0.36(Q 3-86)=Q 3+86-24.6 
=> — 0.36 Q 3 + 3.096 =Q 3-16 


> -— 0.36 Q 3-Q 3 =-16-3.096 


19.096 


=>-— 1.36 Q Scr 19.096 -.Q 23 = 136 oe 


@g-@ _ 1404-86 5.44 _ 


Quartile coefficient of dispersion = «=e ‘orrss” 2.61 


Illustration 15. 

If mode is more than mean by 4.5 and the variance is 121, find coefficient of 
skewness. 
Solution : 


Mode — Mean = 4.5 => Mean — Mode = 4.5 
Variance = 121 = Standard deviation, o = 11 


Mean — Mode = 
=45 _ 0.41 


Coefficient of Skewness = Standarddeviation ~ 1] 


Illustration 16. 

In a frequency distribution, Karl Pearson’s coefficient of skewness is + 0.4, 
standard deviation is 6.5 and mean is 29.6. Find the mode. (Jabalpur 2008) 
Solution : 

Karl Pearson’s coefficient of skewness, 


Mean — Mode 
J = Standard deviation 


29.6 — Mode 


>0.4= 6.5 
=> 0.4 x 6.5 = 29.6 — Mode 


= 2.6 = 29.6 — Mode 
-. Mode = 29.6 — 2.6 = 27 
Illustration 17. 
In a certain distribution the following results were obtained : 
C. V. = 40%, X = 25, Z = 20 
Find out coefficient of skewness. (Jabalpur 2004; Bhopal 2004; 
Vikram 2006) 


Solution : 
C. V. = = x 100 = 40 (given) 
>O = M00 7 Too = 10 [ - = 25] 


Mean — Mode X-Z 26-20 5 


Coefficient of SKEWNESS = Sisndsraceviation = “a0 “0°? 
Illustration 18. 

In a distribution the difference of two quartiles is 15, their sum is 35 and median 
is 20. Find the coefficient of skewness. (Bhopal 2005; Jabalpur 2009) 
Solution : 

Given: Q 3-Q 4 =15....(1) 

Q3+Q 4=35....(2) 

Median, M = 20 ....(3) 

-. Coefficient of Skewness, J Q = om 


Q3 +Q, - 2M 
= 3 - Q 
35 — (2 x 20) 35-40 5 1 


_ == - 5 = -0.83 


_ 15 = 15 15 


Illustration 19. 

If difference of two quartiles = 32, sum of two quartiles = 76, median = 48, then 
calculate 
(a) coefficient of skewness, and (b) Separate values of first and third quartiles. 
(Garwal 2006) 
Solution : 


Given Q 3 -Q 4 = 32....(1) 

Q3+Q 4 =76....(2) 

and Median, M = 48 ....(3) 

(a) Coefficient of skewness, J Q = Oo = 2 


Ae 8 aR 76-96 -20 -5 
16 -2x 48 76-96 _-20 _-6 _ _ yon 
39 = 32 82. 8 


— 3; 


(b) Add on equation (1) and (2), 


2Q3=108.Q3=7™ 
Equation (1) on less in equation (2), 
2Q 1 = 44 oe) = a = 22 


Illustration 20. 
The marks obtained by 40 students in an examination are as follows : 
Marks Students 
0—4 4 
5—9 11 
10—14 16 
15—196 
20—24 3 


Calculate median, quartile deviation and Bowley’s coefficient of Skewness. 
(Ravishankar 2008) 
Solution : 


Class fc 
0O—4 44 
5—9 11 15 
10—14 16 31 
15—19 6 37 
20—24 3 40 


Total 40 


e7ete ca 
=45+ 1 x(9.5—-45) 

=45+ 1 =45+ 2.73 = 7.23 marks 
en ae cat 


20-1 


5 
— 9 5 + ra x (14.5 - 9.5) 


=95+ 5 =9.5+ 1.56 = 11.06 marks 


Q 3 =L 4 + 
= 9.5 + = (14.5 _ 9.5) 
=9.5+ % =9.5+4.69 = 14.19 marks 
Q.D. = 2 ee 
Qs + 1 — 2Q5 
| Q = Q3 - Q 


14.19 + 7.23 — 2x 11.06 
= 6.96 
21 AQ— 2212 _ 


0. 
= =-——=-0.1 
= 6.96 6.96 


= 


Illustration 21. 
Calculate Karl Pearson’s and Bowley’s coefficient of skewness from the 
following data : 


Class 0—5 5—10 10—15 15—20 20—25 25—30 


Frequency 57101644 
(Vikram 2009) 


Solution : 
Calculate Karl Pearson’s Coefficient of Skewness 
Class f Mid-value(x)d5= °° fd fd,” 
0O—5 52.5-3-1545 
5—1077.5-—2-—14 28 
10—15 10 12.5—1-—1010 
15—20 16 17.5000 
20—25 422.5144 
25—30 4 27.52 8 16 


Total 46 — 27 103 


Yds -27 
as ——x5=175-293=1457 


Median, * = ** wo ** 


Standard deviation, o = 


103 (-27)7_. 
oraale Ts. X= fo.24- (0.59)? x5 
224-035x5=11899 X H= 1.37 x 5=6.85 
HE ee 
Mode, Z=/ 4 + %-f-h 
16-10 
= 15 + 2xis-10-4 x 5 


= 15+ wus = 15 + 1.67 = 16.67 
Karl Pearson’s coefficient of skewness, 


14.67 - 16.67 


joe ae ee S634 
Calculate Bowley’s Coefficient of Skewness 


Class Frequency (f) c.f. 
0O—555 

5—10 7 12 

10—15 10 22 

15—20 16 38 

20—25 4 42 

25—30 4 46 


115-5 6.5x5 


=5 + x5 x54 
= 5+ 4.64 = 9.64 
M=Q9=L4+ 7" 


=15+ « x5=15+ 5 =15+0.31 = 15.31 
Ogalyt e* 


=154 0 x5=15+ «8 =15+3.91 = 18.91 
Q3 + Q, - 2M 


JQ = &-a 
+ 


Illustration 22. 

Calculate Karl Pearson’s coefficient of skewness from the following information 
given its mode 54 and arithmetic mean 53.4 : (Indore 2005) 

Marks 0—20 20—40 40—60 60—80 80—100 Total 

No. of Students 10 — 30 — 14 94 
Solution : 


Let unknown frequencies be a and b. 
Marks f Mid-value (x) fx 
O—20 10 10 100 


20—40 a 30 30 a 
40—60 30 50 1,500 


60—80 b 70 70 b 
80—100 14 90 1,260 


Total 54+ a+b =94— 2,860 + 30 a+ 706 


54+ a+ b=94 
>at+b=94-54 
>a+b=40 


> a=40-b....(1) 
=> a = 40 — 24 [From (3)] 


» a=16 
Sfx 
X =— 
> 
53.4 S ee 


= 94 x 53.4 = 2,860 + 30 a+ 70 b 


=> 5,019.6 = 2,860+ 30a+70b 

> 30 a+ 70 b= 5,019.6 — 2,860 

> 30 a+ 70 b=2,159.6 .... (2) 

=> 30(40 — b ) + 70 b = 2,159.6 [From (1)] 

=> 1,200 - 30 b + 70 b= 2,159.6 

=> 40 b = 2,159.6 — 1,200 = 959.6 

b= iw = 23,99 = 24 ....(3) 

Calculate of Standard Deviation 

Mid-value ( x ) Frequency (f) d¢ fd, fd 57 
10 10 -2-—2040 
30 16-1-1616 
50 30000 


70 24 1 24 24 
90 14 2 28 56 


Total 94 — 16 136 


= 145-0.03 x20 = Vl42x20 = 1.1920 = 20.8 


Karl Pearson’s coefficient of skewness, 


Exercise 9 (A) 
Karl Pearson’s Coefficient of Skewness 
1. Calculate Karl Pearson’s skewness and coefficient of skewness from the 


following data : 
13, 16, 19, 17, 20, 25, 25, 27, 28, 30 
[ Ans. Z=25, * =22, 0 =5.46, J = — 0.55] 
2. Calculate Karl Pearson’s coefficient of skewness from the following data : 


Measurement: 6 789 10 11 12 
Frequency: 36913854 


[ Ans. * =9,Z=9, O =1.61, J =O] 


3. Find out the Karl Pearson’s coefficient of skewness from the following data : 
Height (in cm) 75 76 77 78 79 80 81 82 83 
No. of Students 6 8 13 18 20 16 10 7 2 

(Kanpur 2006) 


[ Ans. » = 78.73 lseh] z = 79 Iseh] o = 1.95 Iseh] vg =- 0.14] 


4. From the following data, find out Karl Pearson’s coefficient of skewness : 

35 persons get @ ° 4.5 per person 40 persons get @ ~ 5.5 per person 

48 persons get @ © 6.5 per person 100 persons get @ ° 7.5 per person 

125 persons get @ ° 8.5 per person 87 persons get @ ~ 9.5 per person 

43 persons get @ ° 10.5 per person 22 persons get @ ~ 11.5 per person 
(Garwal 2006) [ Ans. * = 8.07, Z= 8.5, 0 =1.775, J g = — 0.242] 

5. Calculate Karl Pearson’s coefficient of skewness from the following data : 


Class : O—10 10—20 20—30 30—40 40—50 50—60 60—70 
Frequency : 5 10 18 25 20 157 


[ Ans. * = 36.8, Z = 35.83, O = 15.58, J = 0.062] 


6. Find the Karl Pearson’s coefficient of skewness from the following data : 
Age (in years) : 10—20 20—30 30—40 40—50 50—60 
No. of Pearsons : 18 20 30 22 10 


(Ravishankar 2006) 

[ Ans. Median * = 33.6 years, Mode Z = 35.56 years, standard deviation O = 
12.33 years, 
= — 0.258] 


7. Calculate Karl Pearson’s coefficient of skewnees from the following data : 
Class : O—6 6—12 12—18 18—24 24—30 30—36 
Frequency : 12 24 38 52 34 19 


[ Ans. ¥ = 19.32, Z = 20.62, O = 8.22, J =— 0.158] 


8. From the data given below, find Karl Pearson’s coefficient of skewness : 
Marks (less than) : 20 30 40 50 60 70 80 90 100 
Frequency : 3 14 30 60 75 85 92 97 100 


[ Ans. * =49.4, Z = 44.83, O = 18.47, J = 0.25] 


9. From the following data, find Karl Pearson’s coefficient of skewness : 
Marks (less than) : 10 20 30 40 50 60 70 80 
Students : 5 15 30 50 80 100 120 125 


[ Ans. * =43, Z =45, O =17.66, J =—0.113] (Indore 2006; Bhopal 2008) 


10. Calculate Karl Person’s coefficient of skewness from the following data : 
Mark (more than) : 5 15 25 35 45 55 65 75 85 
Students : 120 105 96 85 72 58 32 120 


[ Ans. * = 48.33, Z =61.67, O = 22.15, J =— 0.602] 


11. Calculate Karl Pearson’s coefficient of skewness based on the following data 


Central Value : 15 20 25 30 35 40 
Frequency : 15 17 19 27 19 12 


[ Ans. * = 27.5, Z= 30, 0 =7.75, J =—0.32] (Ravishankar 2008) 


12. Calculate Karl Pearson’s coefficient of skewness from the following data : 
Marks No. of Students 


less than 20 8 
less than 30 38 


less than 40 53 
40—50 2 
50—60 8 
60—70 30 


more than 70 17 
more than 80 4 


[ Ans. * = 47.18, M =50, O = 21.548, J =— 0.39] 
Bowley’s Coefficient of Skewness 


13. Calculate Bowley’s coefficient of skewness from the following data : 
45, 47, 55, 53, 46, 47, 56, 58, 45, 50, 57 


[ Ans. Q ; =46, Q 5 =50,Q 3 =56, JQ =02] 


14. Find Bowley’s coefficient of skewness from the following data : 
Measurement: 10 11 12 13 14 15 


Frequency: 2410851 
[ Ans. Q 4 =12, M=Q 9 =12,Q 3=13,UQ =1] 
15. Calculate Bowley’s coefficient of skewness from the following data : 


Class : 7—15 15—23 23—31 31—39 39—47 
Frequency : 20 10 6 15 19. 


[ Ans. Q 4 =14, Q 9 = 29.67, Q 3 = 39.63, Jg =-0.22] 


16. Calculate coefficient of skewness based on quartiles from the following 


frequency distribution : 
Class : O—10 10—20 20—30 30—40 40—50 50—60 60—70 70—80 
Frequency : 10 40 20 0 10 40 16 14 


[ Ans. Q 4 = 16.875, Q 9 = 45, Q 3 = 58.125, J g =- 0.36] 


17. Calculate quartiles from the following data and find coefficient of skewness 


based on quartiles : 
Class-Interval : 10—19 20—29 30—39 40—49 50—59 60—69 70—79 80—89 
Frequency : 59 14 20 251584 


[ Ans. Q 1 = 37.4, Q 9 =50.3, Q 3 = 60.8, J g =-0.103] 
18. Calculate Bowley’s coefficient of skewness from the following data : 
Mid-value : 75 100 125 150 175 200 225 250 


Frequency : 35 40 48 100 125 80 50 22 
(Bilaspur 2008) 


[ Ans. Q 1 = 138, Q 9 = M = 167.9, Q 3 = 195.94, J g =— 0.032] 
19. Calculate Bowley’s coefficient of skewness from the following data : 
Marks (more than) : 0 15 30 45 60 75 90 105 


No. of Students : 150 140 100 80 70 30 140 
(Indore 2007) 


[ Ans. Q 1 = 25.31,Q 9 = M=525, Q 3 = 72.19, JQ =— 0.16] 
20. Calculate Bowley’s coefficient of skewness from the following data : 
Income (in Rupees) No. of Persons 

Less than 500 15 

Less than 600 28 


Less than 700 45 
Less than 800 70 
Less than 900 100 
Less than 1,000 120 
Less than 1,100 130 
[ Ans. JQ =-0.16] 


21. Calculate coefficient of quartile deviation and Bowley’s coefficient of 


skewness from the following data : 
Age (in years) : 20 30 40 50 60 70 80 
No. of Students : 3 61 132 153 140 51 3 


[ Ans. Q , = 40, M=50, Q 3 = 60, Quartile deviation coefficient = 10, Bowley’s 


coefficient skewness = 0] 
22. Find Karl Pearson’s coefficient of skewness and coefficient of variation from 


the following data : 
Marks (more than) : 0 10 20 30 40 50 60 70 
No. of Students : 100 90 75 50 20 1050 


[ Ans. * =30, O =15, Z =32, J =-0.13, C. V. = 0.5 0r 50%] 
23. Compute coefficient of quartile deviation, quartile coefficient of skewness and 
Karl Pearson’s 


coefficient of skewness from the following data : 
Marks (less than) : 10 20 30 40 50 60 70 80 
Frequency : 15 35 50 70 110 125 165 200 


[ Ans. Quartile deviation coefficient = 0.38, J g = .034, J =—0.135 (on 


Median)] 
24. Find out quartile deviation, mean deviation and coefficient of skewness from 


the following data : 
Height : 58 59 60 61 62 63 64 65 66 
Students : 15 20 32 35 33 22 20 10 8 


[ Ans. Q.D. = 1.5, M.D. = 1.76, J g = + 0.33] 


THEORETICAL QUESTIONS 


Long Answer Questions 

1. Define skewness. Discuss the various measures of skewness. Giving 
examples where necessary. 

2. What do you understand by skewness ? How does it differ from dispersion ? 
Discuss the various methods of measuring skewness. (Vikram 2007, 09) 

3. What is skewness ? Give various formulae for the calculation of coefficient of 
skewness. 


4. Write an essay on the following : 
(i) Skewness 
(ii) Difference between skewness and dispersion 


OBJECTIVE QUESTIONS 


Choose the Correct Answer 
1. The limits of Bowley’s coefficient of skewness are : 
(a) + 1(b) + 2 
(c) + 3 (d) None of these 


2. Who of the following did not discuss skewness ? 
(a) Croxton (b) Karl Pearson 
(c) Bowley (d) Kalley 


3 pornu’e of Karl Pearson’s coefficient of skewness is : 
X X-Z 
(a) J “ (b) va 
a= 
(c) J= > (d) = 
4. In the following the false statement is : 
(a) Frequency distributions are always asymmetrical. 


(b) If median = 24 and mean = 2.6 then skewness is positive in the frequency 
distribution. 


(c) Skewness is positive if(Q 3-Q))>Q 9-Qy, 
(d) Kalley’s coefficient of skewness is based on percentiles. 


[ Ans. :1. (a), 2. (a), 3. (b), 4. (a)] 
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ANALYSIS OF TIME SERIES 


— °* Meaning and Importance 

— * Components of time series 

— * Decomposition of time series 

— ° Techniques of time series analysis 

— °* Measurement of secular trend 

— °¢ Measurement of cyclical variations/ fluctuations 
— °¢ Measurment of irregular varia tions/fluctuations 


MEANING AND DEFINITIONS 


Time series refers to such a series of data in which one variable is 
time. In other words, when the values of a variable are arranged 
chronologically over successive time points, it is said to be a time 
series. For example, if the data related to national income of India 
are represented to through a chronological series from the year 
1950-51 to the year 2000-01, this series of national income is said to 
be a time series. Similarly, if the data related to food production, 
industrial production, import-export are represented over various 
points of time, the series of these data will be also called as time 
series. In time series, the time factor is independent variable and the 
data presented over various points of time are dependent variable. 
The base of measurement of time may be year, month, week or days 
etc. 


Definition of time series : A few definitions of time series are 
given below : 

According to Kenny and Keeping , “ A set of data depending on 
time is called a time series. ” 


According to Croxton and Cowden , “ A fime series consists of 
data arranged chronologically. ” 

According to W.Z. Hirsch , “ A time series is a sequence of values 
of the same variable corresponding to successive points of time. ” 

According to Spiegal , “ A time series is a set of observations 
taken at specified times, usually at equal intervals .” Mathematically 
“a time series is defined by the values Y 4 , Y 9 ....... . of a variable 
Yattimes tj t9........... Thus Y is a function of tf . Symbolically, Y = 
F(t ).” 

According to Ya-lun-chou , " A time-series may be defined as a 
collection of readings belonging to different time periods of some 
economic variable or composite of variables ." 

According to P.H. Karmel, " The analysis of time series has 
developed in the main as a result of investigation into the nature and 
causes of those fluctuations in economic activity called trade cycles. 
Economic theory has suggested various explanations of trade- 
cycles. Analysis of time series has attempted to test the plausibility 
or otherwise of these theories. At the same time, such analysis may 


suggest new hypothesis for economic theorists to work on ." 

From aforesaid definitions it is clear that a systematic arrangement 
of values of a variable according to various points of time is called a 
time series. 


IMPORTANCE OF TIME SERIES ANALYSIS 


Time series analysis is of great importance in state administration, 
economic investment, the evaluation of socio-economic programmes 
and various types of investigation. Its main reason is that analysis of 
time series is mainly done for forecasting the future and evaluation of 
past behaviours. It is of great importance for economists and 
businessmen. Actually the success of an economist or a 
businessman more or less depends upon the accuracy of his future 


forecasting. For example, the estimation of demand of a commodity 
in future, is done on the basis of analysis of demand of that 
commodity in different years of part. Since the forecasting for future 
is done on the basis of analysis of past behaviour of a commodity, 
analysis of time series has great importance in the study of all 
economic problems. Time series analysis has_ the following 
importance : 


(1) Analysis of past behaviour of a variable : Analysis of past 
data of a variable clears the effect of various factors affecting that 
variable. For example, if the data related to the use of chemical 
fertilizers, the data of irrigated land are available for various years 
then after the analysis of these past data we can get the information 
about the effect of use of chemical fertilizers and the irrigation on the 
production of rice or wheat. Thus the effect of population growth or 
technological advancement and other factors can be disclosed by 
the analysis of time series. 

(2) Helpful in forecasting : Future tendencies of a variable can 
be predicted on the basis of analysis of past values of that variable. 
For example, national income, wages or production may be 
predicted for future on the basis of analysis of time series data 
regarding the income, wages or production. It helps in the execution 
of plannings for future. All the future policies are prepared on the 
basis of past behaviour of various variables related to past policies in 
the invested economic system. Various five year plans in other 
countries has been prepared on the. basis of analysis of past five 
year plans. Various targets of future plans are set on the basis of 
their past tendencies. Thus time series analysis gives the base for 
long-term plans by predicting the long-term tendencies. 

(3) It helps in evaluation of current achievement : The 
evaluation of current achievement of a plan is also done on the basis 


analysis of time series data relating to them. For example, the 
evaluation of five year plan is done on the basis of annual growth 
rates in national income. Similarly, the evaluation of government 
policies of controlling inflation is done on the basis of chains of price 
index numbers for various years or months. Thus time series 
analysis helps in the evaluation of economic plans, programmes and 
policies. 

(4) It helps in making comparative studies : If the data of a 
variable are arranged chronologically on the basis of a measure of 
time such as year, month or week then to compare achievements of 
one time period with the achievements of another time period 
becomes easy. For example, if we want to know the effect of green 
revolution on the agricultural production then dividing the data 
related to agricultural production into two time periods such as period 
before the green revolution and the period after the green revolution, 
the growth rates can be determined separately and comparing 
between these two, the effect of green revolution on the agricultural 
production may be disclosed. Thus industrial growth rate in the 
period from 
1970-71 to 1980-81 and industrial growth rate in the period from 
1981-82 to 1987-88 both may be compared. Thus time series 


analysis helps in comparing the achievements of two equal periods. 
It is clear from above analysis that time series analysis is of very 
great importance but it should be noted that like other statistical 
analysis, the time series analysis also gives us only a probable 
information. Specially, in social sciences where experimentation is 
not possible and where we are concerned with unpredictable human 
behaviour, accurate and perfect forecasting is not possible, but even 
the probable values obtained by time series analysis are of great 
importance in the study of economic and business phenomenon. 


COMPONENTS OF TIME SERIES 


Various factors affect the time series. These factors are mainly 
categorised in some groups. These groups are said to be 
components of time series because time series is the outcome of the 
combined effect of the factors of various groups. Following are these 
components : 

(1) Secular trend or long-term movements 

(2) Seasonal variations 

(3) Cyclical variations 

(4) Irregular or random fluctuations 

Seasonal variations (S) and cyclical variations (C) are classified 
under regular short-time oscillations. 


(1) Long term Movement or Secular Trend 

In the words of Simpson and Kafka , “Trend, also called secular 
or long-term trend, is the basic tendency of a series to grow or 
decline over a period of time. The concept of trend does not include 
short range oscillations, but rather the steady movement over a long 


time.” 

From above definition it is clear that the secular trend tells 
generally either upward or downward tendency of a time series. It 
has no relation with short time oscillation. For example, the 
population of India has an upward trend during the last 100 years. 
On the other hand the rates of child mortality show a downward 
tendency for last 50 years due to better medical facilities being 
available : 

(a) Secular trend is the outcome of the effect of such factors which 
more or less remain constant for a long time or which change very 
slowly. These factors are changes in the population, changes in 
tastes and habits of the people, changes in production technique and 
natural resources, improvement in the collection of money and 
business organization, etc. These factors change very gradually or 
slowly. 

(b) Secular trend is always in only one direction, it may be either 
increasing trend or decreasing trend. Both trends may not be present 


together in a time series. 

(c) Secular trend may be either linear or non-linear. If the values of 
time series when plotted on graph paper and cluster around a 
straight line, the trend is known as a linear or a straight line trend. 
Contrary to it, when plotted points of various values of the series do 
not fall in the form of a straight line, the trend is Known as non-linear 
or curvi-linear trend. 

In linear trend growth rate more or less remains constant, while in 
non-linear trend growth rate remains uneven. The study of economic 
and business phenomenon comes under linear trend. 

(d) The word ‘secular’ is a relative word. It is based on the nature 
of the problem. For example, secular means to know the trends of 
national income, agricultural production, import-export for more than 
15 years. On the other hand for knowing the effect of a medicine on 
patient, if it is doing efforts to know his pulse rate then the period of 8 
hours or 10 hours will be considered as secular. The reason is that 
the information about pulse rate may be obtained after each half an 
hour. 

Secular trend is very helpful to understand the behaviour of a 
phenomenon. The important conclusions may be drawn by 
comparing two series through it. Besides it, seasonal, cyclical and 
irregular fluctuations may be studied by isolating secular trend from 
the values of a phenomenon. 


Causes of secular trend : Following are the causes of a secular 
trend : (1) Secular trend in time series is mostly due to the change in 
production technique. (2) Population is also an important factor 
responsible for secular trend. (3) Business organization and change 
in method, discovery of natural resources or their exhaustion, 
change in government policy etc. are also various important factors 
responsible for secular trend. (4) Law of growth applicable in various 
industries is also less or more responsible for secular trend in time 
series. In each industry, the production increases very fast in primary 
stage as well as demand of commodity also increases but after 


sometimes technological changes also could not affect the 
production much more. 


2. Seasonal Variations 


The regular short-term fluctuation in time series occurring within a 
year due to climates or social traditions and customs are called 
seasonal variation. Most of the variations occurring in economic and 
business fields are short-term. The prices of food are less at the time 
of harvesting and more at the time of sowing. The prices of woollen 
clothes increase in the winter season while decrease in summer. The 
prices of ice increase in summer but decrease with decline in 
hotness. The prices of gold and silver increase on the occasion of 
marriages. In the first few months after the beginning of the school 
and college session, the demand of books increases. All these 
explain the seasonal variation. Seasonal variations may be weekly, 
monthly, quarterly, half-yearly etc. according to the nature of 
phenomenon. Seasonal variations may be influctuated forms in both 
directions. 

The study of seasonal variation in quite useful for businessmen, 
producers and consumers. The policies are made by keeping in 
mind the nature of seasonal variation. So in order to understand the 
behaviour of time series it is essential to adjust the seasonal 
variation. This work is done by isolating them from secular trend 
values. 


Causes of Seasonal Variation : There are two main causes of 
seasonal variation : 

(i) Climate : Climate influences the economic data in two ways. 
First, there are two seasons growing and harvesting in agricultural 
industries according to climate. Generally the income of all farmers 
increase at the time of harvesting and consequently their purchase 
also increase in the harvesting season. Such seasonal variation in 
the income and purchase of farmers also affects the total business 
activities of retailers, wholesalers and producers. In addition, the 
demand for various means of transport is increased and banks also 


have to increase total money and credits. Such type of pressure may 
be seen on the means of rail and road transport and banks at the 
time of growing. Accordingly the seasonal variation in agricultural 
industry affects the total economy. Second, the demand of some 
commodities is affected by the changes in climate. For example, 
there is greater demand for woollen clothes and hot drinks in winter 


season. 
Thus there are also seasonal type of production and employment 
in some industries. 
For example, there are more production and employment between 
November and March in sugar industry. 


(ii) Customs and habits : Customs and habits affect the various 
types of expenditure of the consumer. For example, on the occasion 
of festivals like Holi, Diwali, Dussehra, Rakhi etc. there is a big sale 
of clothes, sweets, sugar and other commodities and also customers 
withdrawl more money from the banks on these occasions. Similarly 
each account holder withdrawls more money from banks in the first 
week of month. Thus customs and habits affect the various 
economic activities through the consumer’s expenditure. 
Consequently the effect of seasonal variations may be seen on the 
economic data. 

3. Cyclical Variations 


Like seasonal variation, cyclical fluctuation also occur in economy 
after a certain period but their period is more than a year. Cyclical 
fluctuations are the outcomes of the phenomenon of business cycle 
in the field of economics and business. There are four phases in a 
business cycle namely : (1) boom, (2) recession, (3) depression, and 
(4) recovery. The period of boom is characterized with expansion of 
business activity like production price, employment, sale etc. then in 
the period of recession, there is gradual decline in production, 
employment, price, sale etc. After it, recession stage converts into 


depression stage. In this period, there is fast decline in prices, 
production employment etc. From this phase of depression, 
however, gradual improvement takes place. After a period of 
depression, money accumulates and seeks re-use. The recovery 
period gradually starts. This recovery period generally develops into 
boom period and a business cycle completes. In cyclical fluctuations, 
.the variations occur in this order but each business cycle and the 
periods of its various stages are different. Generally one business 
cycle completes one cycle in 3 years. Sometimes it completes its 
cycle in 5 years, 7 years or eight years. Study of cyclical fluctuation 
is also important like seasonal variation. 
Phases of Trade Cycles 


A businessman or a producer can face the phase of recession and 
depression and can get gain from the phase of recovery and boom 
by following proper policy on the basis of knowledge of cyclical 
variations. The biggest problem in it is that the duration of business 
cycle is not the same and also business cyclical variations are 
usually mixed with irregular variations. It is very difficult to isolate 
them. 

There is significant difference between seasonal variation and 
cyclical variation. First, seasonal variation completes in a year while 
cyclical variations may take 10 years to complete. Second, there is 
regularity in both time period as well as order of seasonal variation 
while in cyclical variation order remains fix but the duration of boom, 
recession, depression and recovery of these orders do not remain 
fix. Third, seasonal variation results from variation in seasons, social 
customs and traditions while cyclical variation results from other 
causes. 


Causes of cyclical variation : Following are the main causes of 


cyclical variation : 
(1) Inflation or deflation of money (2) Increase or decrease in sale 


(3) Production beyond a certain limit (4) Psychological and political 
condition 
4. Irregular or Random Variations 

Irregular variations are the results of random factors. For example, 
irregular or random variations in economy are caused by chance 
factors like famines, floods, wars, strikes, lockouts etc. There is no 
regular period for their occurrence, that is why, they are said to be 
irregular or random fluctuations. Sometimes these factors are very 
effective and become the causes of cyclical fluctuations. Since these 
are uncertain, their forecasting may not be done. The prices of 
commodities in India had increased very fast in 1972 and 1973 due 
to one crore refugee of Bangladesh in the war of 1971 between India 
and Pakistan. 

Causes of irregular variation : The main causes of irregular 
variation are war, flood, drought, earthquake, industrial disturbance, 
technological growth etc. 


DECOMPOSITION OF TIME SERIES 


The values of a variable change with the change in time in the 
field of economics and business. These changes in economic data 
are affected by various factors. For instance, if the last 50 year’s 
data related to sale of a commodity are studied, it will come to know 
that yearly sale changes in every year. There are various causes of 
these variations, e.g. , change in population, change in price, 
change in consumer’s habit, change in quantity of supply of that 
commodity in the market, change in consumers’ income, change in 


economic condition etc. 

Thus the sale of commodity would have been also affected by 
famine, flood, drought, war etc. in a year in the country. If we want to 
forecaste about the changes for next coming years on the basis of its 
study, we will have to study carefully different causes of all factors 
affecting the sale. In other words, we will have to analyse the time 
series related to sale. We attempt to study separately the causes of 


various factors affecting the values of a variable. Secular trend (T), 
Seasonal variation (S), Cyclical variation (C) and Irregular variation 
(1) may have the joint effect on various values of a time series. So, 
isolating these four components, to study them or to measure them 
is called analysis of time series. These are analysed mainly through 
two models : 

(1) Additive model, (2) Multiplicative model. 


(1) Additive model : The basic assumption of additive model is 
that the effect of four components on the values of time series are 


additive in nature. According to formula : 
Y=T+S+CtHtl 

Here, Y is the value of time series and T, S, C and | stand for 
trend, seasonal variation, cyclic variation and irregular variation 
respectively. In this model S, C and | are absolute quantities hence 
they can have positive or negative values. In this model, it is 
assumed that the four components of the time series are 
independent of each other that is, the trend does not affect S, C and 
|. Similarly S does not affect C and C does not affect |. Short-term 
variation (S + C + 1) may be isolated by subtracting secular trend (T) 
from original data. 

Y-T=S+Crtl 

Subtracting seasonal variation from_ short-term fluctuations, 
cyclical and irregular variation may be obtained : 

Y-T-S=CrHtl 

Similarly, subtracting seasonal and cyclical variation from short- 
term variations, irregular variation may be determined : 

Y-T-(S+C)=Y-T-S-C#=l 

Thus in additive model all components are assumed as residual 
components. 


(2) Multiplicative model : In multiplicative model of time series, 
original data are assumed as the product of various components. In 
the form of formula : 

Y=TxSxCxl=TSCl 

This is a typical tradition. It is used to measure short-term 

variations and to isolate them as, 


(1) r78*°*! (2) mesxe"! 
So it is essential to find out trend values first or analysing time 
series. 


MEASUREMENT OF LONG-TERM TREND OR 
SECULAR TREND 


Following methods are used to find trend : 

(1) Free-hand curve method (2) Selected point method 

(3) Semi-average method (4) Moving average method 

(5) Least square method 
1. Free-hand Curve Method : 

This is a very convenient and simple method of measuring the 
trend. Under this method, the original data are plotted on a graph 
and then a smooth curve is drawn through the points. This curve 
describes the average of fluctuations of values. This curve is called 
free-hand trend curve. 

2. Selected Point Method 

Under this method two such values are selected which could 
represent the total figure of whole series. Generally for it, one point 
after the first value and second point just before the last value are 
selected. A straight line is drawn on graph by joining these two 
points which tells the secular trend. This method is quite simple but 
does not give the idea of real facts. 

3. Semi-average Method 


Under this method dividing the time series into two equal parts, a 
straight line showing the secular trend on the basis of arithmetic 
mean of each part is constructed. The original values are plotted first 
on graph paper and after it whole series is divided into two parts on 
the basis of time, for example, if the data are given for 12 years then 
the data of first 6 years and the data of last 6 years will be two 
groups. If the time series has odd number of years e.g. , 13 years or 
15 years etc. then two equal parts are made by omitting the middle 
year. For example, if the data are given for 13 years two group of 


first 6 years and last 6 years are made by leaving the data of 7th 
year. Now arithmetic mean of each group is determined. The 
average of each part is centred in the period of time of the part from 
which it has been calculated, e.g. , if arithmetic mean of 6 years is 
determined, first point is plotted against the middle of third and fourth 
year from the beginning and second point is plotted against the 
middle of third and fourth year from last. Similarly if arithmetic mean 
of 5 years is calculated, the first point will be against the third year 
from the beginning and second point will be against the third year 
from the last. A straight line is drawn by joining these two points on 
graph paper which shows the secular trend. 


Illustration 1. 
The following data related to production of Vinod computers : 


Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 


Production cost (in lakh * ) 6 14 8 16 12 30 24 38 32 40 
Plot the trend values on graph paper by free hand curve method. 


Solution : 
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Illustration 2. 
Plot the following data on graph paper and showing the trend by semi-average 
method : 


Year Production (in '000 units) Year Production (in '000 units) 
1999 16 2005 50 
2000 20 2006 54 
2001 18 2007 60 
2002 28 2008 54 
2003 40 2009 56 
2004 34 2010 62 


Solution : 

Number of years in the Question is 12 so we shall first divide them 
in two groups of 6 years each (from 1999 to 2004 and from 2005 to 
2010) and then find their arithmetic mean. It is shown in the following 
table : 


ti 


Producten mo ants 
tI 


bry 


13 00) 601 600 60S Ck OS OT oe 
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Year Production in ‘000 units Group Total Semi-average 
Average year 

1999 16 

2000 20 In the middle 

2001 18 156 *-s-* of 2001-2002 

2002 28 

2003 40 

2004 34 

2005 50 

2006 54 In the middle 

2007 60 336 *- -® of 2007-2008 

2008 54 

2009 56 

2010 62 


Illustration 3. 
Determine the trend of the following data by semi-average method : 


Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 
Export in Crore 576576968710 


aft 
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Solution : 


In Question total number of years is 11. So we shall not use the 
data of middle year, /.e., 2005 for finding the average. We shall find 
the average of two groups of first 5 years (from 2000 to 2004 and 
last 5 years (from 2006 to 2010): 

Year Export (in crore © ) Group Total Semi-average Average 


Year 


2000 5 
2001 7 


2002 6 30 *-=~° 2002 
2003 5 
2004 7 
2005 6 
2006 9 


2007 6 40 ®-s-* 2008 
2008 8 

2009 7 

2010 10 


Exercise (A) 

1. Find trend with the help of free hand curve method for the data given below : 
Year 2000 01 02 03 04 05 06 07 08 09 10 
Production (in lakh ton) 15 18 16 21 19 24 20 28 22 30 26 
2. Plot the following data on graph paper and show long-term trend by semi- 

average method : 
Year 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 
Import in ( ~ crores) 18 20 14 25 38 32 44 50 60 52 64 
[ Ans. Average of group from 1993-1997 *-23. and from 1999 — 2003 *-* ] 
3. Determine the trend line of the following data by semi-average method : 
Year 1995 96 97 98 99 2000 01 02 03 04 05 06 07 08 09 10 


Export ( ° crore) 10 14 13 12 10 1517 13 1519 21 19 16 17 18 19 
(Sagar 2006) 


[ Ans. Average year 1998-99 = 13, year 2007-08 = 18 | 

4. Determine the trend line of the following data by semi-average method : 

Year 2000 01 02 03 04 05 06 07 08 09 10 

Import ( ° crore) 20 23 19 18 24 25 21 23 27 22 26 

[ Ans. Average Year 2002 = 21, Year 2008 = 24] 

5. Determine the trend line and trend values of the following data by semi- 
average method : 

Year 2005 2006 2007 2008 2009 2010 

Sales (000 ~ ) 40 45 44 43 47 51 


[ Ans. Average Year 2006 = 43, Year 2009 = 47 | (Indore 2006, 08) 
4. Moving Average Method 


Moving average method is used to analyse the all types of 
variations of a time series, /.e. , secular trend, short-term fluctuation, 
cyclical variation and irregular variation. Under this method, the 
moving average are calculated from a series of data to analyse the 
trends of a time series. Number of moving average will be fewer than 
the number of items in the series. The reason is that the some 
extreme years are left while calculating moving averages. Generally, 
three yearly, five yearly or seven yearly moving average is calculated 
in a time series. Moving average is placed against the middle year. 
For example, if three yearly moving average is determined, it will be 


3 


placed against ~~ = 2nd year. If five yearly moving average is 


determined, it will be placed against middle year third year of each 


group. 
When moving average is calculated for second group, one year 
from above is left and next year just after the first group is included 
and so on. In calculation of moving average through this process, 
some extreme years are left and the number of moving averages, 
thus, becomes less than the number of years in the series of data. 


To decide the period of moving average depends on the nature of 
data. At the time of deciding the period of moving average generally 
it is taken into account after how many years the fluctuations occur in 
the values. The number of years after which variation is seen should 
be taken as the period of moving average. Most commonly, moving 
average is determined for odd number of years. But is not a hard 
and fast rule. Moving average may be calculated for even number of 
years also according to nature of data. For example, if 4 yearly 
moving average is to be calculated then. It will be placed against 
ae , Le. , in the middle of 2nd and 3rd year. In order to place 
against original value of a year, again two yearly moving average is 
to be determined from calculated moving averages. Short time 
fluctuations are determined by the difference of original figures and 


moving averages. 
Following is the formula of finding moving average : 


a+b+e b+c+d ct+dt+e 
ais ; 


Three yearly moving average = 93 3 3 > 


b+e+d+e bec+dt+e+f c+d+e+fig 


Five yearly moving average =~ = 


Here a, b, c, d, e, f, g means original data of first, second, third, 
fourth, fifth, sixth and seventh year. 


Illustration 1. 
Find trend value with the help of 3-yearly moving average from the following 
data : 


Year 1996 1997 1998 1999 2000 2001 2002 2003 
Sales (in thousand) 5 79 12 11 108 12 
Year 2004 2005 2006 2007 2008 2009 2010 


Sales (in thousand) 13 17 19 14 13 12 15 
(Vikram 2004; Ravishankar 2006) 


Solution : 


Year Sales (in thousand tons) 3-yearly total 3-yearly moving average 


1996 5 —— 

1997 7 21 7.00 
1998 9 28 9.33 
1999 12 32 10.67 
2000 11 33 11.00 
2001 10 29 9.67 
2002 8 30 10.00 
2003 12 33 11.00 
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Illustration 2. 

Calculate 5-yearly moving average from the following data and plot the data on 
the graph paper : 

Year Rain (in inch) Year Rain (in inch) 

1997 100 2004 118 

1998 94 2005 98 

1999 81 2006 101 

2000 78 2007 103 


2001 102 2008 91 
2002 147 2009 89 
2003 158 2010 103 


Solution : Calculation of 5-yearly Moving Average 


Year Rain (in inch) 5-yearly total 5-yearly moving average 
1997 100 —— 

1998 94 —— 

1999 81 455 91 

2000 78 502 100 

2001 102 566 113 

2002 147 603 121 

2003 158 623 125 

2004 118 622 124 

2005 98 578 116 

2006 101 511 102 

2007 103 482 96 

2008 91 487 97 

2009 89 —-— 
2010 103 — — 
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IIlustartion 3. 


Draw trend line by finding the 4-yearly moving average from the following data 
(ignore the decimal points) : 


Week Production (in thousand tones) Week Production (in thousand tones) 
18211 75 
27312 73 
3 74 13 75 
475 14 76 
5 73 15 75 
6 72 16 75 
7 7617 78 
8 76 18 76 
974 19 78 
10 75 20 79 


Solution : 
Calculation of 4-yearly Moving Average 
Week Production 4 yearly Central Moving 4-yearly moving 


(in thousand tones) moving total total average 
1 82-—-- 
273--- 

3 74 304 599 75 
4 75 295 589 74 
5 73 294 590 74 
6 72 296 593 74 
7 76 297 595 74 
8 76 298 599 75 
9 74 301 601 75 
10 75 300 597 75 
11 75 297 595 74 
12 73 298 597 75 
13 75 299 598 75 
14 76 299 600 75 
15 75 301 605 76 
16 75 304 608 76 
17 78 304 611 76 
18 76 307 618 77 
19 78 311 -——- 


20 79 --—- 
Trend line on the basis of 4-yearly moving average 


eo Original data 
4- yearhy moving 
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Exercise (B) 

1. Find out trend by using three-yearly moving average in the following time 
series : 

Year 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 
2010 

Price 11129814 15 18 16 17 19 21 24 28 30 

[Ans. -, 10.7, 9.7, 10.3, 12.3, 15.7, 16.3, 17, 17.3, 19, 21.3, 24.3, 27.3, — ] 

2. The net profits of Piyush Ltd. for eleven successive years are given below. 
Find the three yearly moving averages. 

Years 2000 2001 2002 2003 2004 2005 2006 2005 2008 2009 2010 

Profits (in lakhs) 2.7 2.9 3.4 5.2 5.8 6.4 9.3 9.2 9.8 10.2 11.0 

[ Ans. Moving average from 2001 to 2009 3.0, 3.8, 4.8, 5.8, 7.2, 8.3, 9.4, 9.7 
10.3] 

3. Compute four yearly moving average trend from the following data : 

Year 2003 2004 2005 2006 2007 2008 2009 2010 

Production (in million tons) 20 22 28 20 32 36 42 50 

[ Ans. -, -, 24, 27.25, 30.75, 36.25, —, -] 


4. Obtain the information about trend using four-yearly moving averages and 
represent them on the graph : 

Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 

Y 1516 17 17 16 18 19 20 19 20 21 21 

[ Ans. —, —, 16.38, 16.75, 17.25, 17.88, 18.62, 19.75, 20.12, -, — ] 

5. Assuming five yearly cycle, determine the trend of bank clearing by moving 
average method : 

Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 

(Bank clearing 53 79 76 66 69 94 105 87 79 104 97 92 

(in crore * ) 

[ Ans. For 2001 to 2008 68.6, 76.8, 82.0, 84.2, 86.8, 93.8, 94.4, 91.8 ] 

6. Calculate five-yearly moving average (ignore decimals) and represent the 
following data gra- 
phically : 


Years Rainfall (in inches) Years Rainfall (in inches) 
1997 100 2004 118 

1998 94 2005 96 

1999 81 2006 101 

2000 78 2007 103 

2001 102 2008 91 

2002 147 2009 89 

2003 158 2010 103 


[ Ans. Moving Average from 1999 to 2009 91, 100, 113, 121, 124, 124, 115, 
102, 96, 97] 

7. Taking 6-yearly moving averages, calculate trend for the following data : 

Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 

No. of cases 58 1110978 11 15 20 16 13 15 

[ Ans. -, -, —, 8.58, 9.08, 9.67, 10.83, 12.25, 13.33, 14.42, —,-, — ] 

8. Determine the period of the moving average for the following data and 


calculate moving averages for that period : 


Year Value Year Value Year Value 
1 390 6 396 11 459 

2 381 7 387 12 438 

3 372 8 381 13 435 

4 405 9 435 14 492 

5 420 10 474 15 510 


[ Ans. —, —, 393.6, 394.8, 396.0, 397.8, 403.8, 414.6, 427.2, 437.2, 448.2, 459.6, 
466.8, -, -. 

Find 5-yearly moving average. ] 

9. Construct a four-yearly moving average from the following data : 

Year 2004 2005 2006 2007 2008 2009 2010 


Imported cotton consumption 129 131 106 91 95 84 83 

in India (in '000 bales) 

[ Ans. —, —, 110.00, 99.87, 92.37, -, — ] 

10. Calculate the trend values by the method of moving averages, assuming a 
four-yearly cycle, from the following data relating to sugar production in India : 


Year Sugar Production (lakh tonnes) Year Sugar Production (lakh tonnes) 
1999 37.4 2005 48.4 
2000 31.1 2006 64.6 
2001 38.7 2007 58.4 
2002 39.5 2008 38.6 
2003 47.9 2009 51.4 
2004 42.6 2010 84.4 


[ Ans. -, —, 37.99, 40.74, 43.39, 47.74, 52.19, 53.00, 52.88, 55.73, —, — ] 
5. Least Square Method 


It is the best method of measuring the trend. The sum of squares 
of deviations of real values from trend values is minimum by this 
method. Line of best fit is obtained for time series by this method. 
This line may be either straight line or second degree parabolic 
curve or exponential curve. 

(a) Straight line trend : Following basic equation is used for 


finding simple linear trend or first degree trend by least square 


method : 

Y=at+bx 
where, Y = Trend values 
X = Unit of time 

a and b are constant. Least square method is used for finding 
them. Following two normal equations are used for finding them 
Y=Na+bux 
XY=a0X+b0Xx? 
where N is the number of years (or months). We can change the 
original and scale to simplify the calculation. 


Illustration 1. 

Find straight line trend by principle of least square method from the following 
data : 

Years 2006 2007 2008 2009 2010 

Value 8 10 15 13 14 


Solution : 


Year Value x = xX 2 xy Trend value 
X Y X -2008 Y=a+t bx 

2006 8-24-1612+1.5 x (-2)= 
2007 10-—11-—1012+1.5 x (—1) 
2008 15000124+1.5x0=12 
2009 13111312+1.5x1=13.5 
2010 1424 2812+1.5x2=15 


Total 60 0 10 15 


Let simple trend line be 


9 
= 10.5 


Y=at+tbx 
Normal equations 
Y=Nat+bilx 


xy=alx+box 


Substituting the values of ) Y, 0 x, 0 x “ and U xy 
60=5at+bx0- 60= 5a 

15=ax0+bx10 

- 15=10b 


120=10a+20b 
So all trend values may be determined by equation Y = 12 + 1.5 x 


Illustration 2. 
Determine the trend values from the following data by least square method : 


Year 2004 2005 2006 2007 2008 2009 2010 
No. of production units 125 128 134 136 138 141 143 
Plot the original data and trend values on a graph. (Jabalpur 2004) 


Solution : 


2 


Year No. of units Deviation from 2007 x “ xY Trend values 


X Y (x) Y=a+ bx 


2004 125 —3 9-375 135+ 3 (-3)=126 
2005 128 — 2 4— 256 135 + 3 (-2) = 129 
2006 134 — 11-134 135 +3 (-1) = 132 


2007 136000 135 + 3 (0) = 135 

2008 138 1 1 138 135+3 x 1=138 

2009 141 2 4 282 135+3x2= 141 

2010 14339 429 135+3x3=144 

N=70 y=945 0 x=00x*=281 xy=84 
Let the simple trend line by 

Y=at+t bx 


Normal equation for finding the value of a and b , 
Y=Nat+bli x 

xY=Na+b0x? 

Substituting the values from above table in both the equation, 
945=7atbx084=ax0+28b 


945 84 
ga ats b= BS 
yf a 28 


So slope per year will be 3 units and trend equation will be : 
Y= 13573 xX 


Trend VWahe Lire 


2004 2005 2006 2007 2008 2003 2010 


Year 


(b) Second degree parabola method : Under this method, the 
following basis equation is used to find trend : 

Y=a+bX+cx? 

where a, 6 and c are three constants which are determined by 
using the following three normal equations : 
Y=Na+boX+coOXx? 
xY=a0X+bo0x4%+c0x? 
x2 v=a0x7?+box%+ce0Xx4 


Illustration 3. 
Fit a second degree trend on the basis of following data : 


Year 2006 2007 2008 2009 2010 


Value 574910 


Solution : 


Year XYxx2x2x7xVx 2 Y Trend value Y=atbx+cox2 


2006 5-24-8 16-10 206.14 + 1.2 (—2) + u (2) 2 =5.45 

2007 7-11-11-776144+1.2(—1)+ u (-1) 2 = 5.37 

2008 40000006.14+ 1.2 (0) + « (0) =6.14 

20099111199614+1.2x1+41 (1)2=7.77 

2010 1024816 20406.144+1.2*1+4 1 (2) 2 =10.25 

Total 35 0 10 0 34 12 76 

Second degree equation between x and y 

Y=a+bxt+cx 4 

- Normal equations will be : 

Y=Na+bOx+cox? 

xY=all xt+b x2 4+¢ x? 

x2 Y=a x24+5b x%4+¢ x4 
Substituting all the values, 
35=5a+0+10C....(1) 
12=0+1056+0....(2) 
76=10a+0+34C....(3) 

or35=5a+10Cc 

12=10b 

76=10a+34Cc 

From equation (2) b = w-” 
Multiplying equation (1) by 2, 
70=10a+20C 
76=10 a+ 34 C....(2) 


Substracting 
-6=-14¢c 


F 
c=—=0.43 
‘ 14 


From equaiton (1), 35=5 a+10(0.43)=5a+43 
or5 a = 35-—4.3 = 30.7 

So Yaeutiee fe! 

It is the required second degree parabolic trend. 

(c) Exponential curve : When there is a constant amount of 
growth or decline in time series, the exponential curve may be used. 
Its basic equation is as follows : 

Y=ab x (where a and b are constant) 

taking the logarithm, 

log Y=log a+X log b 

We find the values of log a and log b by solving like straight line. 
We can get the value of a and b by taking antilogarithm. 

Illustration 4. 

The sales of a multinational company in crores of Rupees for the years 2004 to 
2010 are given below. Estimate probable sales for 2011 and 2013 using the 
equations Y = ab a 

Year 2004 2005 2006 2007 2008 2009 2010 

Sales 32 47 65 92 132 190 275 
Solution : Let x = X — 2007 


Xx Ylog Yx log Yx 4 


2004 — 3 32 1.5051 — 4.5153 9 
2005 — 2 47 1.6721 — 3.3442 4 
2006 — 1 65 1.8129 — 1.8129 1 
2007 0 92 1.9638 0 0 

2008 1 132 2.1206 2.1206 1 
2009 2 190 2.2788 4.5576 4 
2010 3 275 2.4393 7.3179 9 


Total 0 13.7926 4.3237 28 


where X denotes years and Y represents sales : 


Let exponential curve between x and y be Y = ab as Taking 
logarithm, 

log Y= log a+x log b 

. Normal equation : 

= 13.7926 = 7 log a + 0 4.3237 = 0 + 28 log b 

- 13.7926 =7 loga. "7 
or a = Antilog 1.97037 = 93.4053 
and 4.3237 = 28 log b 
or “ns = 0.1544 

. b = Antilog 0.1544 = 1.4269 

Thus Y = 93.4053 (1.4269) * 

= 93.4053 (1.4269) * ~ 2007 


Y 9011 = 93.4053 (1.4269) 4 = 387.2 
Y 9013 = 93.4053 (1.4269) © = 788.4 


= 1.97037 


Exercise (C) 


1. Obtain the information about the slope of straight line trend by principle of least 


squares method on the basis of following table : 
Year 2006 2007 2008 2009 2010 
Price 107 110 114 114 115 
(Vikram 2004, 06) 


[ Ans. Y =a+bX,a = 112 , b = 2, SO per year slope = 2, trend 108, 110, 
112, 114, 116] 
2. Calculate the trend values by the method of least squares from the data given 


below and estimate the sales for the year 2013 : 
Year 2006 2007 2008 2009 2010 
Sales (in '000) 12 18 20 23 27 


[ Ans. Y = 20+3.5 X or Y = 37.5 ] 


3. Fit a trend line by the method of least squares from the following data : 
Year 2004 2005 2006 2007 2008 2009 2010 


Sales (in '000 * ) 80 90 92 83 94 99 92 
(i) Find the slope of a straight line trend of these figures. 
(ii) Do these figures show a rising trend or a falling trend ? How do you arrive at 


these conclusion ? 
(Bhopal 2006; Jabalpur 2008) 


[ Ans. b = + 2, i.e.., it is a rising trend if it is negative then there will be a falling 
trend. Trend : 84, 86, 88, 90, 92, 94, 96 ] 


4. Compute the trend by the method of least square : 
Years 2005 2006 2007 2008 2009 2010 


Value ( y ) 83 92 71 90 169 191 
[ Ans. b = 22.6, Trend value 59.5, 82.14, 104.7, 127.3, 149.9, 172.5] 


5. Estimate the trend from the following time series (assuming a linear trend) : 
Year 2004 2005 2006 2007 2008 2009 2010 


No. of production units 125 128 133 135 140 141 143 
(Jabalpur 2004 Modified) 


[ Ans. b = 3.107, Trend value % 125.68, 128.79, 131.99, 135, 138.11, 
141.214, 144.321 ] 
6. Fit a straight line trend by the method of least squares in the following series. 


Show also the original data and trend line on the graph paper : 
Year 2005 2006 2007 2008 2009 2010 
Production (in crore quintal) 7 10 12 14 17 24 


[ Ans. 6.30, 9.38, 12.46, 15.54, 18.62, 21.07] (Indore 2004; Vikram 2005) 

7. Fit a straight line trend by the method of least squares with the following data : 
X12345 

Y 346910 

[ Ans. 2.60, 4.50, 6.4, 8.3, 10.25] 


8. Agricultural outputs, in millions of tonnes for 5 years are given below : 
Year 2006 2007 2008 2009 2010 


Output 80 85 87 93 100 
Obtain a least square linear estimate of the output for the year 2012. 


[ Ans. 79.4, 84.2, 89.0, 93.8, 98.6 and for year 2012 estimate is 108.2] 
9. Below are given figure of production (000) of toys. Fit a straight line trend and 


calculate trend values : 
Year 2004 2005 2006 2007 2008 2009 2010 
No. of Toys (000) 12 10 14 11 13 15 16 
(Jiwaji 2004) 


[ Ans. 10.75, 11.50, 12.25, 13.00, 13.75, 14.50, 15.25] 


10. Calculate trend values by least square method from the following data : 
Year 1999 2002 2004 2005 2007 2008 2010 

Value 75 67 68 65 50 54 41 

(Bilaspur 2004) 


[ Ans. 78, 69, 63, 60, 54, 51, 45 | 

[ Hint : Since in this Question, the years are not given at equal interval so take 
the deviation from 
middle value 2005 to find x. 


11. Below are given the figures of profits (in 000 rupees) of a certain shop : 
Year 2004 2005 2006 2007 2008 2009 2010 


Profit (in '000 Rs.) 60 72 75 65 80 85 95 

(i) Fit a straight line trend by the method of ‘least squares’ and show the trend 
values. Estimate the profit for 2014, 

(ii) What is the monthly increase in profits ? 


[ Ans. (i) 61.429, 66.286, 71.143, 76, 80.857, 85.714, 90.571, (ii) © 405.58] 


12. Fit a trend to the following data by the least squares method : 
Year 2002 2004 2006 2008 2010 


Production (in ‘000 tons) 18 21 23 27 16 
Estimate the production in 2007 and 2012. 


[ Ans. 20.6, 20.8, 21, 21.2, 21.4; for 2007 21.1 ('000 ton)]; for 2012 21.6 (000 
ton)] 
13. Use method of least squares to determine sales for the year 2011. Following 


data is given: 


Year 2005 2007 2008 2009 2010 
Sales of refrigerators 100 110 130 125 160 
[ Ans. 159.59 ] 


14. Calculate the trend values by least square method for the following data : 
Year 2006 2007 2008 2009 2010 2011 

No. of Sheep (lakh) 56 55 51 47 42 38 

(Jiwaji 2006) 

[ Ans. 57.67, 53.87, 50.07, 46.27, 42.47, 38.67] 


15. Find trend values from the following data by least square method : 
Year 2005 2006 2007 2008 2009 2010 


Sale (thousand * ) 1517 19 10 12 17 
(Sagar 2005) 


[ Ans. 16.0, 15.6, 15.2, 14.8, 14.4, 14.0] 


16. Below are given figures of production (in thousand tons) of a sugar factory : 
Year 2001 2003 2004 2005 2006 2007 2010 
Production 77 88 94 85 91 98 90 


Fit a straight line by the least square method and calculate the trend values. 
(Indore 2005) 
[ Ans. 83.3, 86.0, 87.4, 88.8, 90.2, 91.5, 95.7] 


17. The production of coal mines are given in thousand tones : 
Year 2004 2005 2006 2007 2008 2009 2010 


Production (in thousand tones) 70 85 94 83 90 100 98 
Find simple linear trend values by least square method. (Sagar 2006) 
[ Ans. 76.78, 80.71, 84.64, 88.57, 92.50, 96.43, 100.36 | 


MEASUREMENT OF SEASONAL VARIATION 


In time series, seasonal variations are those short time fluctuations 
which occur regularly within a year. Their measurement is widely 
useful with business point of view. If yearly values are given, the 
information about seasonal fluctuations may be obtained and so we 
can get help from their study in short time forecasting and 
investment. Following are the methods of their measurement: . 


(a) Simple average method (b) Ratio to trend method 
(c) Ratio to moving average method (d) Link relatives method 


(a) Simple average method : Under this method, seasonal 
variations are estimated with the help of simple arithmetic average. 
First of all arrange each monthly (or quarterly) value of various years 
in a row and then find their simple average. After it find average of 
100, find the percentage of each seasonal average. These 
percentage figures are called seasonal indices. So, 


Average of the month 
eral average 


Seasonal variation index = — Gersralavers 


Illustration 1. 
Find seasonal indices from the following data by simple average method : 
Year Summer Monsoon Autumn Winter 
2006 30 81 62 119 
2007 33 104 86 171 
2008 42 153 99 221 
2009 56 172 129 235 
2010 67 201 136 302 


x 100 


(Indore 2008) 

Solution : Calculation of Seasonal Indices 

Year Summer Monsoon Autumn Winter Total Average 

2006 30 81 62 119 292 73.0 

2007 33 104 86 171 394 98.5 

2008 42 153 99 221 515 128.8 

2009 56 172 129 235 592 148.0 

2010 67 201 136 302 706 176.5 

Total 228 711 512 1048 2499 624.8 

Average 45.6 142.2 102.4 209.6 499.8 125.0 

Seasonal oS «100 +22 100 wet 5100 

Index = 36.5 = 113.8 = 81.9 = 167.7 399.9 100 
Illustration 2. 

Find seasonal indices from the following data : 


Month 2007 2008 2009 2010 
January 18 20 22 24 


2096 
209.6 100 
125 


February 20 22 19 24 
March 18 19 20 22 
April 17 18 18 20 

May 15 16 17 18 

June 16 20 18 22 

July 17 24 24 25 
August 19 23 25 26 
September 21 23 26 27 
October 23 24 24 26 
November 23 24 25 27 
December 24 26 27 29 


Solution : 
Month 2007 2008 2009 2010 Total for Average of Seasonal 
four years four years indices 
January 18 20 22 24 84 21.00 96.46 
February 20 22 19 24 85 21.25 97.60 
March 18 19 20 22 79 19.75 90.72 
April 17 18 18 20 73 18.25 83.83 
May 15 16 17 18 66 16.50 75.78 
June 16 20 18 22 76 19.00 87.28 
July 17 24 24 25 90 22.50 103.35 
August 19 23 25 26 93 23.25 106.80 
September 21 23 26 27 97 24.25 111.38 
October 23 24 24 26 97 24.25 111.38 
November 23 24 25 27 99 24.75 113.69 
December 24 26 27 29 106 26.50 121.73 


Total 1045 261.25 1200.00 


General Average 87.083 21.7708 100 
Seasonal Index for the month of January 


Average of January _ 21.00 flap 
— = “~~ {O00 = * 100 = 96.46 


General average 21.7708 


Similarly seasonal indices for other months are determined. 

(b) Ratio to trend method : This method is based on 
multiplicative model and gives more importance to trend. Under this 
method we have to do the following works : 


(i) Find the trend values (monthly or quarterly) by least square 
method. 


(ii) Original data (O) of each time period is divided by 
corresponding trend value (T) and quotient 7 is multiplied by 100. 
Thus data will be free from the effect of trend. To eliminate the effect 
of cyclical and irregular fluctuations, we have to adopt the following 


process : 

(iii) Find the arithmetic average of trend ratios of each time period 
of all the years. 

(iv) Convert trend ratio of various seasons into seasonal indices. 
For it add all seasonal trend ratio average and find their general 
average. Assuming this as base 100, convert all trend ratio averages 
into percentage. These indices are required seasonal indices. 


Illustration 3. 
Find seasonal variation by ratio to trend method from the following data : 


Year Quarter 
PIV 

2006 30 40 36 34 
2007 34 52 50 44 
2008 40 58 54 48 
2009 54 76 68 62 
2010 80 92 86 82 


Solution : 
First find annual position by taking sum of given quarterly data. 
Calculation of trend by least square method : 
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Year Yearly Average x = 2008 xY X “ Trend value 


(X) Total (Y) Y= a+ bx 
2006 140 35 — 2 — 70 4 32 
2007 180 45—1-—45 1 44 
2008 200 50000 56 
2009 260 65 1 65 1 68 
2010 340 85 2 170 4 80 


Total 280 0 120 10 


Normal equations : 
Y=Na+bUxoxY=auxt+box? 


+ 280=5at+0.a=%3 =56 
120=0+10 bb= i =12 
So annual growth rate = 12; Quarterly growth rate = > =3 


Calculation of quarterly trend values 

Trend value for 2006 is 32 which will be placed against the middle 
of 2nd and third quarter and aquarterly growth rate is 3. So trend 
value of second quarter = »-:-» and trend value of third quarer is 
23-339 | Trend values of first quarter = 30.5 — 3 = 27.5 and trend value 
of fourth quarter = 33.5 + 3 = 36.5. Similarly trend values for all other 
quarters can be determined and ratio of original data by 


corresponding trend value can be determined. 
Trend Value (Quarterly) 

Year I III IV 

2006 27.5 30.5 33.5 36.5 
2007 39.5 42.5 45.5 48.5 
2008 51.5 54.5 57.5 60.5 
2009 63.5 66.5 69.5 72.5 
2010 75.5 78.5 81.5 84.5 


Ratio of original data by trend values : (709) 
Year Quarter | Quarter II Quarter III Quarter IV 
2006 109.1 131.1 107.5 93.1 
2007 86.1 122.4 109.9 90.7 
2008 77.7 106.4 93.9 79.3 
2009 85.0 114.3 97.8 85.5 
2010 106.0 117.2 105.5 97.0 
Total 463.9 591.4 514.6 445.6 
Average 92.78 118.28 102.92 89.12 


Quarterly Sea- a5 «100 Tan x100 ane x 100 100.810 
sonal Indices = 92.0 = 117.3 = 102.1 = 88.4 


+118.28+102.92+89.12 4038.1 .... 
00.8 


Average of averages> SC ; 
(c) Ratio to moving average method : It is the most frequently 


used method of measuring seasonal variation. This method is 
analogous to ratio to trend method. In it, the ratios are determined 


from moving averages. Following are the steps under it : 
(i) First, moving averages are determined. 


(ii) Divide original figures (O) by corresponding moving average 
(T) and multiply by 100. 7”) 
(iii) By making separate table, find monthly (or quarterly) moving 


100] 


(O . . 
average of |"*"") . It is known as average seasonal fluctuation. 
(iv) Find seasonal indices by assuming general average of these 
seasonal fluctuations as base. 


Illustration 4. 
Find out seasonal indices by method of moving average from the following data 


Quarter 


Year IIIIINIV 

2008 68 62 61 63 
2009 65 58 66 61 
2010 68 63 63 67 


Solution : 
Year Quarter Original Moving Centred 4 -quarterly 
Data Total of Total moving |" 


(O ) 4-quarters average (7 ) 
2008 | 68 

1 62 

254 

Ill 61 505 63.125 96.63 

IV 63 251 498 62.25 101.20 
247 

2009 | 65 499 62.375 104.21 
252 


Il 58 502 62.75 92.43 

250 

Ill 66 503 62.875 104.97 
253 

IV 61 511 63.875 95.50 

258 

2010 | 68 513 64.125 106.04 
255 

Il 63 516 64.5 97.67 


Quarter 

Year LIL IMIIV 

2008 — — 96.63 101.20 

2009 104.21 92.43 104.97 95.50 
2010 106.04 97.67 —— 


Total 210.25 190.10 201.60 196.70 
Average 105.125 95.05 108.80 98.35 


Seasonal “p83 2 95.8572 “gprgy %100 5p 95 ™200 
Indices = 105.30 = 95.21 = 100.97 = 98.52 
General Average ————e—eeee ee 99.83 


(d) Link Relative Method : It is the most satisfactory method to 
measure the seasonal variation. Introducer of this method was Karl 
Pearson. This method is a little difficult than other methods. Under 
this method there are the following steps : 


(i) In this method, divide first all the given values by their preceding 
value and express the quotient into percentage. These percentages 
are called link relatives. 

(ii) Find average for each monthly (or quarterly) link relatives. 


(iii) Prepare chain relative by assuming first average of link 
relatives as 100. So chain relative of first month (or quarter) will be 


100 and chain relative of each current month (or quarter) “00 X (Link 
relative of current of current month (or quarter) x chain relative of 


preceding month (or quarter)] 
(iv) Find also corrected chain relative of first month (or quarter) by 
assuming last month (or quarter) as base. 


(v) Let U. 7 = corrected chain relative of first month (or quarter). 

(vi) Correction Factor -: (in monthly data) and * (in quarterly 
data). 

If (| T is positive then correction factor is subtracted and if 1 T is 
negative then correction factor is added. In the case of monthly data, 
we find corrected chain relatives by subtracting (or adding) | T , 2 
T,3 Pde TA T respectively from (or to) chain relative of 
February, March, April, ...., December. If the data are quarterly then 
we find corrected chain relatives by subtracting (| 7,2 1 T and 3 


T from (or to) chain relative of Il, III and IV quarter. 

(vii) Find arithmetic average of corrected chain relatives and 
assume it as base. Express corrected chain relative in the 
precentage of this base. There are the required seasonal indices. 


Illustration 5. 
Find out seasonal indices by the Link Relative Methods for the following data : 
Quarter 
Year I II IIIV 
2007 100 120 150 120 
2008 90 180 180 90 
2009 120 150 210 140 
2010 150 250 300 150 


Solution : Link Relative Method 
Quarter 
Year III TIIV 
2007 — 120 125 80 
2008 75 200 100 50 
2009 133 125 140 67 


2010 107 167 120 50 

Total 315 612 485 247 

Average Link 105 153 121.25 61.75 
Relative 

Chain Relatives 100 “i ww 
= 153 = 185.51 = 114.55 


Correct C.R. 100 153 — 5.07 185.51 — 10.14 114.55 — 15.21 
= 147.93 = 175.37 = 99.34 


100x100 =147.93x100 175.387x100 99.34 100 


Seasonal 130.66 130.66 130.66 130.66 


Indices = 76.53 = 113.22 = 134.22 = 76.03 
Chain relative of first quarter (on the basis of fourth quarter) : 


L.T. of Ist quarter x C.R. of 4th quarter | 114.55*105 _ 120.28 
= 100 “1000 


_ 120.28-100 _ 20.28 
: 4 


. _AT 
Correction Factor ©: 


=5.07 


Average of corrected chain relatives = °° ("°° Tee ®2*s 
Exercise (D) 


1. Find the average of quarterly trend value for the given year from the data given 


below : 
Quarters 


Year First Second Third Fourth 
20064654 
20075866 
20083523 
20096945 
20106855 


[ Ans. Y = 5.25 + 0.225 X (base = 2008) ] 
2. Using the data given below calculate trend values by the method of moving 


averages taking the period as 4: 
Year Summer Monsoon Autumn Winter 
2006 30 81 62 119 
2007 33 104 86 171 
2008 42 153 99 221 


2009 56 172 129 235 
2010 67 201 136 302 
(Bhopal 2004; Bilaspur 2006; Indore 2006) 


[ Ans. -, -, 73.3, 76.6, 82.5, 92.0, 99.6, 106.9, 114.6, 122.5, 130.5, 134.6, 140.7, 
146.3, 149.4, 
154.4, 158.9, 168.1, -, — ] 


3. Assuming that trend is absent, determine if there is any seasonality in the data 


given below : 
Quarter 
Year I II ILIV 


2007 37 41 33 35 

2008 37 39 36 36 

2009 40 40 33 31 

2010 33 44 40 40 
(Indore 2006; Rewa 2006) 


[ Ans. Seasonal indices 99.7, 110.5, 95.1, 94.7] 
4. Find the seasonal indices from the following data by Ratio to Moving Average 


method : 
Quarters 2006 2007 2008 2009 2010 
140 42 41 45 44 
1 35 37 35 36 38 
Ill 38 39 38 36 38 
IV 40 38 42 41 42 


[ Ans. 108.96, 92.35, 96.46, 102.23] 
5. Find the seasonal indices from the data regarding monthly consumption of 
wheat in a town by simple average method : 


Month 2006 2007 2008 2009 2010 
January 44 47 48 50 51 

February 45 48 44 58 50 

March 39 43 43 45 50 

April 42 44 47 52 60 

May 41 45 48 51 55 

June 40 46 47 52 55 

July 48 53 55 57 62 


August 49 44 56 55 51 
September 42 45 47 49 57 
October 45 48 51 52 64 
November 41 44 48 52 60 
December 50 55 57 63 70 


[ Ans. 96, 98, 88, 98, 96, 96, 110, 102, 96, 104, 98, 118] 

6. The seasonal indices of the sales of radio sets of Murphy Radio Company ina 
certain shop are given below : 

Quarter | II III IV 

Seasonal index 95 88 76 114 

If the total sales in the first quarter of a year be worth * 20,000, determine how 
much worth of radio sets should be kept in stock by the shop to meet the 
demand in each of the remaining quarters. 

[ Ans. II-18,526, III-16,000, IV—24,000] 

7. The seasonal indices of the sale of the readymade garments of a particular 
type in a departmental store are given below : 


Quarter Seasonal index 
| January-March 95 

I] April-June 80 

Ill July-September 90 

IV October-December 125 


If the total sales in the quarter of the year be worth ~ 50,000 determine how much 
worth of garments of this type should be kept in store to meet the demand in 
each of the remaining quarters ? 

[ Ans. 50,000, 42,105.26, 47,368.42, 65,789.47] 

THEORETICAL QUESTIONS 
Long Answer Questions 


1. What do you mean by time-series ? What are its main components ? State the 


importance of analysis of time series. (Bilaspur 2004; Rewa 2005) 


2. Describe seasonal variation in a time series ? What are methods to determine 


these variations ? 
(Bhopal 2006; Jabalpur 2005) 


3. What is meant by seasonal variation of a time series ? Describe the various 
methods to measures them. 

4. How will you find trend values in any time series by the method of least 
squares ? Clearity your answer by taking a numerical example. (Indore 2006) 

5. What are main components of time series ? Describe importance and analysis 


of time series. 
(Jabalpur 2006; Rewa 2006; Vikram 2006) 


6. Explain clearly the meaning of time-series and time series analysis. Indicate 
the importance of such analysis in business. (Bhopal 2004; Jiwaji 2004; Sagar 
2005) 

7. What is a time series ? Distinguish between secular trend, seasonal variation 
and cyclical fluctuations. How would you measure secular trend in any given 
data ? (Jiwaji 2006) 

Short Answer Questions 

1. Explain the differences between seasonal variation and cyclical variation. 

2. How do you measures the seasonal variation by the link relative method ? 
Explain by giving 
example. 

3. How will you determine trend value in a time series by method of least square 
2 
. How is trend measured by moving average method ? 

. What is the concept of time series analysis ? 


4 
5 
6. State the causes of various components of time series. 
7. Write note on irregular variation. 
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. Explain any two of the following : (Bhopal 2005) 
(a) Importance of analysis of time series 


(b) Long-term trend 
(c) Decomposition of analysis of time series 
(d) Calculations of trend by method of least square 


OBJECTIVE QUESTIONS 
State whether following statements are true or false : 
1. The sale of ice-cream in the month of June of every year represents the trend. 
2. The variations having the repeating period more than one year are called as 
cyclical variations. 
3. There is no difference between seasonal variation and cyclical variation. 
4. All the cyclical variations are seasonal variations. 
5. The moving average is not affected by the extreme values. 


[Ans.: 1. False, 2. True, 3. False, 4. False, 5. False.] 
Choose the correct answers : 
1. Which of the following is related to the change in time of harvesting : 


(a) Trend (b) Seasonal fluctuation 
(c) Cyclical variation (d) Irregular fluctuation 


Divateustts is determined by moving average. 
(a) Trend (b) Seasonal variation 
(c) Cyclical fluctuation (d) Irregular 


3. The deduction in death rate due to the development in science and technique 


is related to : 
(a) Trend (b) Seasonal fluctuation 
(c) Cyclical variation (d) All of these 


4. The sale of sweets on the occasion of Id is related to which of the following 
components of time 


series : 
(a) Trend (b) Cyclical fluctuation 
(c) Seasonal fluctuation (d) Irregular fluctuation 
5. The time period of seasonal fluctuation is of : 
(a) 1 year (b) 6 year (c) 3 year (d) 12 year 


[Ans. 1. (b), 2. (a), 3. (a), 4. (c), 5. (a)] 


Analysis of Time Series | 
YY = Na+by.X +e X" 
VE XY =ab X +b X72 4+eyX? 
yx? -ayX°-byX°~ er x4 
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CORRELATION ANALYSIS 


— °* Meaning of Correlation 

— ° Definition of Correlation 

— ° Types of Correlation 

— ° Degree of Correlation 

— ° Methods of Determining Correlation 


MEANING OF CORRELATION 


When the change in one variable affects the change in other 
variable, two variables are said to be correlated with each other. 
When both variables deviate in same direction, that is, when an 
increase (or a decrease) in one variable corresponds an increase (or 
a decrease) in other variable, correlation between two variables will 
be positive. For example : (1) There is positive correlation between 
heights and weights of school going children, (2) The increase in 
price of a commodity will be associated with an increase in its 
supply. So there is positive correlation between them. 


If both variable deviate in opposite direction, i.e. , if an increase 
(or a decrease) in one variable results a decrease (or an increase) in 
other variable, the correlation between two variables will be negative. 
For example : (1) The decrease in price of a commodity will be 
associated with an increase in its demand. Hence there will be 
negative correlation between them, (2) The correlation between 
pressure and volume of a gas at fixed temperature is negative. 
When two correlated variables deviate perfectly in same ratio, there 
is perfect correlation between them. When two variables deviate in 
same direction with same ratio then it is said to perfect positive 
correlation. On contrary to it when both variables deviate in same 


ratio but in opposite direction, there is perfect negative correlation 
between these two variables. 


DEFINITION OF CORRELATION 


Correlation is a statistical technique which is used to analyse the 
behaviour of two or more variables. Correlation is a measure of 
relationships between two variables. Its main definitions are : 


According to W.I. King, " Correlation means that between two 
series or groups of data there exists some causational connections 

According to Prof. L.R. Connor, " /f two or more quantities vary in 
sympathy so that movement in the one tend to be accompanied by 
corresponding movements in the other than they are said to be 
correlated ." 

According to Prof. A.L. Boddington, " Whenever some definite 
connections exist between the two or more groups classes or series 
of data, there is said to be correlation ." 

According to E. Davenport, " The whole subject of correlation 
refers to that inter-relation between separate character by which they 
tend in some degree at least to more together ." 

TYPES OF CORRELATION 


On the basis of direction, ratio and number of variables there may 
be the following types of correlation : 

(1) Positive and negative correlation 

(2) Simple and multiple correlation 

(3) Partial and total correlation 

(4) Linear and non-linear correlation 

(1) Positive and negative correlation : When both variables 
deviate in same direction, the correlation between two variables is 


said to be Positive. For example, an increase in price of a commodity 


results in an increase in the supply of that commodity. The 
correlation between two is said to be positive. 

Opposite to it, when two variables deviate in opposite direction, 
i.e. , an increase in one variable results a decrease in other variable 
then there is negative correlation. For example, a fall in price of a 
commodity corresponds an increase in its demand. It reveals 


negative correlation between two. 
Positive and negative correlation are shown in the following tables 


Positive correlation 
Price of commodity (in ~ per unit) 10 20 30 40 50 


Price of commodity (in unit) 150 180 220 270 330 


From above tables it is clear that the price and supply of a 
commodity are varying in the same direction in positive correlation. 
Contrary to it there is an increasing tendency in the price of 
commodity while there is a decreasing tendency in its demand under 
negative correlation. 


Price of commodity (in ~ per unit) 10 20 30 40 50 
Price of commodity (in unit) 150 140 125 110 100 


(2) Simple and multiple correlation : Correlation between two 
variables is called simple correlation. For example, the correlation 
between amount of money and price of commodity is called simple 
correlation. Contrary to it, correlation between more than two 
variables is called multiple correlation. For example, if we study 
together the relationship of chemical fertilizer, use of pesticides, use 
of high yielding seeds, means of irrigation etc. on production of rice, 
it is called multiple correlation. 

(3) Partial and total correlation : Partial and total correlation are 
the two forms of multiple correlation. In partial correlation, we 
measure the correlation between two variables by keeping the other 


variables included in total correlation constant. Contrary to it, in total 
correlation, we study the total effect of all the independent variables 
upon the dependent variable. For example, if we measure the 
correlation between production of rice and fertilizer by keeping 
constant the effect of hybrid seeds, pesticides and irrigation means 
then it is called partial correlation. Contrary to it when we consider all 
the factors then it will be total correlation. 

(4) Linear and non-linear correlation : When the rates of 
change in two variables are same, we call it linear correlation. If the 
values of these variables are plotted on graph paper, a straight 


would be formed. Linear correlation is clear from the following table : 
Linear correlation 


x 18 54 90 126 162 198 

y 6 18 30 42 54 66 

In above table, both the series x and y are increasing in the same 
: ratio. Hence there is a linear correlation between these two. 
Contrary to it, when two correlated variables do not vary in the same 
ratio, we call it non-linear correlation. For example, when we double 
the amount of chemical fertilizer, the amount of production of rice 
would not be necessarily doubled. We will call it non-linear 
correlation. Non-linear correlation is also called curvilinear 
correlation. If we plot the values of these two variables on a graph 


paper, a curve would be obtained. We may explain it by an example : 
Non-linear correlation 


X 25 30 35 40 45 50 

y 58 15 30 42 48 

In the above example, the variable x is increasing in a constant 
ratio but the rate of increase in y variable is irregular. Hence there is 


non-linear correlation between them. 


Linear and non-linear correlation may also be graphically 
explained : 


Jn 0 oe 0 ~~ 0 eT 
" (2) Mega tore Linear [n) Fosthore Linear (mm) Fosttore More [ivi Negative Mon- 


LIF oT _- ore TL - , a : a Ie. oT Far lorie v1 
DEGREE OF CORRELATION 


Degree of correlation between two series is determined by 
correlation coefficient. 

The following results may be obtained by determining correlation 
coefficient : 


(1) Perfect correlation : When variables of two series vary in the 
same ratio then it is called perfect correlation. It may be of both types 
positive and negative. When both variables move in the same 
direction, i.e. , an increase (or a decrease) in one variable of series 
results in an increase (or a decrease) in other variable in the same 
ratio then it is called perfect positive correlation and thus correlation 
coefficient of two series will be + 1. When variables of series vary in 
same ratio but in opposite direction then the correlation between two 
series is perfect negative. Their correlation coefficient is — 1. Hence 
correlation coefficient between two series lies between — 1 and + 1. 

(2) Absence of correlation : When there is no relationship 
between two series i.e. , there is total lack of interdependence from 
both then it is considered the absence of correlation. The correlation 
coefficient of such series is zero. 


(3) Limited degree of correlation : When the correlation 
between two series is neither perfect nor absent then in such case 
there is limited degree of positive or negative correlation between 
them. It may be of following three types : 

(i) High degree of correlation : When correlation coefficient 
between two series is between 0.75 and 0.999 then it is considered 
as high degree of correlation. It may be positive or negative. 

(ii) Moderate degree of correlation : When correlation coefficient 
between two series is between 0.25 and 0.749 then it is considered 
as moderate degree of correlation. 

(iii) Low degree of correlation : When correlation coefficient 
between two series is less than 0.25 it is said to be low degree of 


correlation. 

Some scientists divide limited, degree of correlation in 4 parts as 
very high between 0.75 and 1, high between 0.50 and 0.75, low 
between 0.25 and 0.50 and very low between O and 0.25. There is 
no hard and fast rule about it. It tells only the degree of correlation. 
Hence anyone of these classification may be assumed. First above 
classification may be clearly understood from the following table : 

Degree of Correlation 


Degree Positive Negative 
Perfect + 1 — 1 


High Degree between + 0.75 and + 1 between — 1 and —0.75 

Moderate Degree between + 0.5 and + 0.75 between — 0.75 and — 
0.5 

Low Degree less than 0 and + 0.5 less than — 0.5 

No Correlation 0 0 


IMPORTANCE OF CORRELATION 


The principle of correlation is of immense use in practical life. 
Correlation is seen in most of the variables of social, economic, 
business and scientific field. For example, correlation is found 


between income and expenditure, income and saving, investment 
and production, demand and price, price and supply etc. In the same 
way interdependence events are also found in other sciences such 
as education, psychology, medical science etc. 

Correlation existing between all such variables may be partially 
measured with the help of correlation technique. The partial measure 
of correlation presents not only the basis of comparative study but 
makes possible the interpolation or extrapolation of the values of 
other variable on the basis of values of one variable. Thus study of 
correlation helps to forecaste the business uncertainty and 
determine the future policies on the basis of this knowledge. 


METHODS OF DETERMINING CORRELATION 


The following are the main methods to determine correlation : 
(1) Simple Graphic Method 

(2) Scatter Diagram or Dot Diagram 

(3) Karl Pearson’s Coefficient of Correlation 

(4) Spearman’s Rank Difference Coefficient of Correlation 

(5) Coefficient of Concurrent Deviations 
(6) Least Square Method. 


1. Simple Graphic Method 


This method is used to represent the correlation between two data 
series. The degree of correlation is not known by this method, the 
only idea of direction and degree of correlation may be obtained by 
it. In this method, the values of the series are shown on ordinate axis 
(Y-axis) which are vertical lines and the common attributes of series 
such as time, number or place on abscissa axis (X-axis) which are 
horizontal lines. At the time of construction of graph, the scales 
should be taken in such a way that the curve of both series would lie 
closed together by which the correlation between them could be 
studied easily. For it, separate measures for the data of each series 
may be taken. The values of related series are plotted on the graph 
paper according to this measure. The inferences are drawn by 
graphic method in the following way : 

(1) If the graphs of both series move upward from left side to the 
right side in the same direction then this situation represents direct or 


positive correlation. 

(2) If the graphs of both series represent fluctuations in the 
opposite directions then it is said to be indirect or negative 
correlation. 

(3) The greater is the similarity in the graphs of both series, the 
greater is the degree of correlation between them. 

(4) If no tendency of variation is seen in both graphs in same 
direction or opposite directions then there is lack of correlation 
between the two. 


Illustration 1. 
Following table shows income and expenditure of a wage earner working in a 
factory : 


Month Jan. Feb. March April May June July August 
Income (in © ) 35 35 40 40 40 50 50 60 


Expenditure (in © ) 30 30 35 30 35 40 45 55 
Is there any correlation between Income and Expenditure of the wages earner ? 
Show graphically. 


Solution : 
60 sca} 
fe a Macpe nite | 
E 50 
a a5 
Bn 


Jan. Feb. Tirh April bes June Jub 
Fig. 2 
It seems from the inspection of both graphs that the graphs of both 
series are running upward from left to right side, hence there exists 
positive correlation between the two. 


2. Scatter Diagram or Dot Diagram 


Scatter diagram is the simplest method to determine whether there 
is correlation between two variables or not. For the construction of 


Ang ust 


scatter diagram, the values of independent variable (X) are taken on 
abscissa (X-axis) and the values of dependent variable (Y) related to 
it are taken on ordinate (Y-axis) of the graph paper. One dot is 
plotted on graph paper for a pair if values related to X and Y series. 
Thus a number of dots are plotted on graph paper for the given pairs 
of values of X and Y series. The inference is drawn about the 
correlation between two variables on the basis of tendency of dots 
so obtained on the graph paper. If the plotted points have the 
tendency to increase in a definite direction, it is infered that there is 
correlation between two variables. If the points are very close to 
each other, we infer a good amount of correlation between the 
variables. On the basis of clustering of these points we may infer 
about the correlation in the following ways : 

(i) If the clustering of points has the tendency to move upward 
from left corner to right side, the correlation between two variables is 
positive. 

(ii) If the clustering of points has the tendency to move downward 
from left to right side, the correlation between two variables is 
negative. 

(iii) If there is no Knowledge of direction of clustering of points but 
they scatter widely over the graph paper, there is lack of correlation 
between two variables. 

(iv) If the plotted points lie on a straight line from lower corner of 
left side to the upper corner of right side, the correlation between two 
variables is perfect positive. 

(v) Contrary to it, if the plotted points lie on a straight line from 
upper corner of left side to lower corner of right side, there is perfect 
negative correlation between two variables. 


Poe itive Cormelation Negative Correlation vate Lack of Correlation 


Perfert Negative 
Correlation 


r=-1 


Perfect Positive 
Correlation 


Fig. 3 

From Fig. 3 it is clear that : 

In Fig. (1), the plotted points of pair items related to X and Y series 
show a declining tendency from left to right side, hence here is 
negative correlation between two variables. In Fig. (3), the running 
tendency of the plotted points of the various pair items of X and Y 
variables is not clear. These points are widely scattered. Hence here 
is no correlation between X and Y variables. In Fig. (4), the pair 
items related to X and Y variables are plotted on a straight line and 
the path of plotted points is rising from lower left corner to upper right 
corner. Hence here is perfect positive correlation between two 
variables X and Y (r= + 1). 

In Fig. (5), the pair items related to X and Y variables are plotted 
on a straight line and the path of plotted points is decling from upper 
left corner to lower right corner. Hence here is perfect negative 
correlation between X and Y (r =— 1). 

Though scatter diagram is the simplest and non-mathematical 
method to find the correlation between two given variables, but we 
can't get the knowledge about the degree of correlation between two 
variables. By this method we can get only idea about the direction of 
correlation. 


3. Karl Pearson’s Coefficient of Correlation 


The famous statistician Karl Pearson’s correlation coefficient is 
said to be the best method of measuring correlation. We can get not 
only the knowledge about the direction and degree of correlation but 


we can also get a numerical measure of correlation by it. There is 
perfect accuracy in Karl Pearson’s correlation coefficient with 
mathematical point of view as it is based on arithmetic mean and 
standard deviation. This method was introduced by Karl Pearson in 
1890. Karl Pearson used the following formula to measure the 


correlation coefficient between two variables X and Y. 
_ Cov.(X-Y)} 5 Cov.(X-Y) 
Vi (Xx yV (¥ ) Ox, Oy 


j yy D(X -X)y -¥) 
where, Covariance =Cov.(X,¥)= 2 


sfx ey 
2 D(X -X) 


Variance of X = "r=" w 


y(¥-¥) 


Variance of Y = "=" x 


Substituting all these values, ) 


Main Characteristics of Karl Pearson's Coefficient of Correlation 

(1) Knowledge of direction of correlation : We can get the 
knowledge of direction of correlation through plus (+) or minus (—) 
signs of Karl Pearson’s coefficient of correlation. Positive sign (+) 
shows positive correlation and negative sign (—) shows negative 
correlation. 


(2) Knowledge of degree of correlation : It is known through this 
coefficient what the degree of correlation is. At the time of calculation 
of this coefficient, correlation comes out between — 1 and + 1, where 
(— 1) means perfect negative correlation and (+ 1) means perfect 
positive correlation. If correlation coefficient comes out to be 0 it 
means that there is lack of linear correlationship between two series. 

(3) Ideal measure of correlation : This coefficient is calculated 
on the basic of arithmetic mean and standard deviation. So it 
represents the ideal measure of correlation. 

(4) Relative measure : Correlation coefficient is relative measure 
of correlation between variables. Hence it has no unit. 

(5) No effect of changes : Correlation coefficient is not affected 
by change of origin and scale. 

(6) Need of interpretation of result : After the calculation of this 
coefficient, its interpretation is essential. If the value of this coefficient 
is written in figure only then a common man would understand 
nothing by it. 

(7) Mathematical knowledge is essential : Mathematical 
knowledge is essential in its calculation otherwise it becomes difficult 
to calculate it. 

Assumptions of Karl Pearson’s Coefficient of 


Correlation 

Karl Pearson’s coefficient of correlation is based on the following 
assumptions : 

(1) There should be a linear relationship between two statistical 
series. 

(2) The independent causes which affect the series should have a 
relationship of cause and effect. 

(3) The series, which are correlated to each other, are affected by 
various independent causes so as to produce normality and 


probability in the distribution. 
Method of Calculation of Karl Pearson’s 


Coefficient of Correlation 
(A) Direct Method : 


(i) First arithmetic means of both the series * and ¥* are 
calculated. 

(ii) Then the deviation ( x ) of all values of series X from * and the 
deviation ( y ) of all values of series Y from ¥ are calculated and 


their sums (| x and 1 y are determined. (Here (| x and U1 y will 
be always zero) 


(iii) The above deviations of each item are mutually multiplied ( x x 
y ) and then 1 xy is determined. 


2 2 


(iv) Squaring the deviations of each series, [| x “ and |) y ~ are 
determined. 


(v) If it essential then standard deviations of both series are 
determined separately, 


S 


D xy 


Illustration 2. 
Calculate Karl Pearson's coefficient of correlation from the following data : 
X 20 30 23 20 17 10 


Y 10 15 12 18 25 28 


Solution : 
Calculation of Karl Pearson’s coefficient of correlation 


X «-¥ x? Y w-7 yxy 
xy 

2000 10-8640 

30 + 10 100 15-39-30 
23+3912-6 36-18 
200018000 
17-39257 49-21 


10 — 10 100 28 10 100 — 100 
X=1200 x=00x2=2180 Y=108 0 Y=O00 y2=258 


(B) Short-cut Method 

In this method the deviations of the series are determined from 
assumed mean. If it is asked in the question to solve by taking a 
particular number as an assumed mean then that number will be 
taken as assumed mean. But in question if it is not asked so then we 
can take any number as an assumed mean lying in the middle of the 
series, i.e. , which may be closure to arithmetic mean. The 
deviations from assumed means are shown by dx and dy 
respectively. Remaining all the calculations are done in the same 
way as previously. After the calculation, the coefficient of correlation 
is calculated by using anyone of the following formulae. If dx and dy 
are large then calculation may be also done in this method by taking 
common factor for the simplicity in the calculation. Following are the 
formulae of calculation of correlation coefficient : 


N-Sdxdy -(S dx Ydy) Xdx:-Zdy 


Xdx.dy - 


4. {[vsar ~ (ae) || WS ay? (Yay)? | 2 aa 


SdxSdy 
y 2 ax X ay Spe Ss 
dx dy - Ldx: Ldy 
: N XY dx 7) 


N j 
(Lax) 


3, bee 4. PE EE 
(C) Calculation Method without Deviations 


Correlation coefficient between any two series may be calculated 
without taking any deviation. Its method is as follows : 


iSiag \2 
Sdy" - {2dy } 
¢ N 


(1) First separate total of each series, /.e., X and Y are 
determined. 


(2) Then by squaring all item values ( X 2 and Y 2 ) their total 
x? and (| Y 2 are determined. 
(3) After it by multiplying each pair mutually ( X- Y ) their total 


X:Y is determined and Karl Pearson’s correlation coefficient is 
determined through any one of the following formulae : 


Illustration 3. 
From the data given below compute Karl Pearson's coefficient of correlation 
without devations : 


X23456789 
Y12345678 
Solution : 
xx2yvy2xy 
24112 
39246 
4163912 
52541620 
6 36 5 25 30 
7 49 6 36 42 


8 64 7 49 56 
9 81 8 64 72 


x=440 x2 = 2840 y=360 y2= 2040 xy = 240 


Sy ={ =x 
\ 


SY) 
val 


jim } 
y frX?_(EX)" [S¥? _ (SY 
NNW N \N 


240-198 42 


= _ 424g 
""Si5a5xJsa5  8x5.25 ~qo Tt! 


Illustration 4. 

Find Karl Pearson's correlation coefficient from the following given values of X 
and Y: 

X 78 89 96 69 59 79 68 61 

Y 125 137 156 112 107 136 123 108 

Consider 69 and 112 as assumed mean in X and Y series respectively. 

(Jiwaji 2006; Rewa 2009) 

Solution : 

Calculation of Karl Pearson’s correlation coefficient : 

Let dx = X — 69, dy = Y-112 

X Series Y Series 


Deviation Deviation Product of 
from Square of from Square of deviations 
X assumed deviations Y assumed deviations of series 
Mean (69) Mean (112) X and Y 
(dx ) dx 2 (dy) dy * ( dx-ay ) 
78 +9 81 125+ 13 169 + 117 
89 + 20 400 137 + 25 625 + 500 
96 + 27 729 156 + 44 1936 + 1188 
6900112000 
59 — 10 100 107 —5 25+ 50 
79 + 10 100 136 + 24 576 + 240 
68 —11 123 + 11 121-11 
61 — 8 64 108 — 4 16 + 32 


N=8 0 dx=47 0 dx? = 1475 0 dy = 108 0 dy * = 3468 
dx.dy = 2116 


Sandy - LANE ay) 2116-42 % 108 
h 7 : 


9 \2 : S dy 2 = 2 2 
{free = ee | Ydy* - ar [1 - (OF ates (108) | 


8 8 


1481.5 
1198.875 x 2010 


“i 


1475 — 276.125) (3468-1456) ~ 


1481.5 1481.5 
2409788.7 1562.35 


= 0.95 


Illustration 5. 
Find the correlation coefficient for the following table by step devation method : 


X 10 14 18 22 26 30 
Y 18 12 24 6 30 36 
Solution : 


XY OP OS oy 2s oF 
1018-3-1391 
1412-2-2444 
1824-10010 


2260-3009 
26 301111 1 
30 3622444 
Total —3-—3 1219 19 


5 det gor (EH UEA') 


Illustration 6. 


The distribution of population and fully or partially blind person are given in the 
table. Find out correlation between age and blindness. 


Age (in years) 0-10 10-20 20-30 30-40 40-50 50-60 60—70 70-80 


No. of Persons 100 60 40 36 24 1163 
(in thousand) 


No. of blinds 55 40 40 40 36 22 18 15 
Solution : 


Age group Population No. of blinds No. of blinds 


(in thousand) (in one lakh) 


0-10 100 55 iw" 


10-20 60 40 >" 
20-30 40 40 i 
30-40 36 40 
40-50 24 36 


22 


50-60 11 22 1" 

60-70 6 18 7" 

70-80 3 15 3" 

Since we have to find correlation coefficient between age and 
blindness so we take here age as X -series and blindness as Y- 
series. 

Calculation of Karl Pearson’s Coefficient of Correlation 

Let dx = X — 45, dy = Y-— 150 

X — Series Y — Series 


Age Mid Deviation Square No. of Deviation Square Product of 
Group value from of devia- blinds from of deviations 


assumed tions in one assumed devia- of X and Y 
mean (45) lakh mean (45) tions series 


C.I. X dx dx 2 Ydy dy 2 dx.dy 

0-10 5 — 40 1600 55 — 95 9025 + 3800 

10-20 15 — 30 900 67 — 83 6889 + 2490 

20-30 25 — 20 400 100 — 50 2500 + 10000 

30-40 35 — 10 100 111 — 39 1521 + 390 

40-50 45 00 150 0 0. 0 50-60 55 + 10 100 200 + 50 2500 + 500 
60-70 65 + 20 400 300 + 150 22500 + 3000 

70-80 75 + 30 900 500 + 350 122500 + 10500 


N=80 axod2xn=80 dyod“yXO axay 
= _ 40 = 4400 = + 283 = 167435 = 21680 


dx) dy. — 40)(288 
Y de dy — LAVEdy) 21680 — = 40)(283) 
- i d eo Gh 8 
. T dx)? , (Sd 2 cs 2 998)2 
Is 2 (2a) Isao 20) f | 1400 —{ 40) | 167435 — 1283) | 
N | N | 8 [ 8 | 
z 21680 +1415 _ 23095 
(4400 — 200)(167485 -10011.125) (4200) (157423.875) 
23095 23095 


= — = 0.898 
25713.4259 


(661180. 275) 


Illustration 7. 

Find out Karl Pearson’s coefficient of correlation between ages and playing 
habits : 

Age 15 16 17 18 19 20 

No. of students 250 200 150 120 100 80 


No. of regular players 200 150 90 48 30 12 
(Vikram 2003) 


Solution : 

Age No. of Students No. of Players Percentage of regular 
players 

15 250 200 21”-" 

16 200 150 201°" 

17 150 90 iw” 

18 120 48 ww" 

19 100 30 iw” 

20 80 12 m7" 

By assuming the age of students as X -series and their playing 
habit, .e., percentage of regular player as Y- series, Karl Pearson’s 
correlation coefficient will be calculated. 

Let dx = X -— 17, dy= Y-60 

Calculation of Correlation Coefficient 


X - Series Y - Series Product of 
Deviation Square Percentage Deviation Square deviations 
Age from of of from of of 


Group assumed deviations players assumed deviations X and 
Y 
Mean (17) Mean (60) series 


X ax dx 2 Yay dy @ ax-dy 
15-24 80+ 20 400 —40 
16—-1175+15 225-15 
170060000 
18+1140—20 400-20 
19 +24 30 —30 900 —60 
20 +39 15-45 2025 — 135 


N=6 0 dx=3 0 dx*=19 0 dy=—60 (i ay * = 3950 11 ax-dy = 
_ 270 


oe I SO) = 240 - 240 


~ j{19—La][s950—600] = Wmis = —0,99 


TT Ex5500 
Ans. : There is a negative correlation of high degree between age 
and playing habit. It is clear from this that the playing habit reduces 
as the age increases. 
Illustration 8. 
A few items are missing in the following distribution. Calculate Karl Pearson's 
coeffcient of correlation. The additional information given is that : 
(| xy = 38.4, Standard deviation of A = 2 and standard deviation of B = 3. 
A 15 18 — 22 — — 16 25 
B 22 25 20 — 14 12 — — 


Solution : Correlation coefficient 
_ SUX AMY FY _ Dxy/N 
f Oxy OO, where x=X-X,y=Y-Y 


_ 38.4/8 BBA _ 38.4 


2x3 8x2x3 48 = 0.8 
Illustration 9. 


Calculate correlation coefficient between the density of population and death- 
rate from the following data : 


District Area (in sq. kilometre) Population Death 
A 200 40000 480 
B 150 75000 1200 
C 120 72000 1080 
D 80 20000 280 
(Indore 2006) 


Solution : 

First we shall find population density and death-rate : 
. ‘ Population 
Density of population = ar oa 


Death 
Population 


x 1000 


Death-rate = 


Population dx = X — 250 dx 7 Death dy = Y—14 ay “ dx-dy 

Density ( X ) Rate ( Y ) 

200 — 50 2500 12 —2 4 100 

500 250 62500 16 2 4 500 

600 350 122500 15 1 1 350 

2500014000 

N= 40 dx = 550 c dx 2 = 187500 0 dy=10 dy*=9 dxay = 
950 


Xdx-=dy ( 
Di dudy- 7 2 ono ~ (650%) 


Sar? a Say? 2 i i arson — 620)" | x i = a 
N N 4 4 


ie 950 - 1875 pier 
yi 187500 - 75625)x(9-0.25) ~ J111825x875 
812.5 


1 812.5 
“prews7s 930.1755 = 0.82 


Illustration 10. 

Calculate Karl Pearson’s coefficient of correlation taking deviation from actual 
mean 52 
( X - series) and 44 ( Y -series) : 


X - series 44 46 46 48 52 54 ? 56 60 60 


Y - series 36 40 42 40 ? 44 46 48 50 52 
(Indore M.Com. 2004; Vikram 2004) 


Solution : 

Here one item is unknown in each series X and Y. So we shall 
find them first, let these values be a and b. 

X- series : N= 10 *-» 


> xX = 
. 45 « DK=NE 


or 44+ 46+ 46+48+52+54+a+56+60+60=10x 52 
or 466 + a = 520 

. a = 520 — 466 = 54 

Y- series : 

N =10 v-u 
OP sy-nv 
or 36+ 40+42+40+6+44+46+48+50+52=10 x 44 
or 398 + b = 440 

. b = 440 — 398 = 42 

Calculation of Karl Pearson’s Correlation Coefficient 


X=52 Y=44 


XV wwe err x2 y2 yy 


44 36 —-8 — 8 64 64 64 
46 42-6 -4 36 16 24 
46 42-6-2 364 12 
48 40-4-4 16 16 16 
52420-2040 

54 4420400 
944622444 

56 48 4 4 16 16 16 

60 50 8 6 64 36 48 

60 52 8 8 64 64 64 


Total 0 0 304 224 248 


Deviations are taken from actual arithmetic mean so, 
248 248 248 


a 4 
= oR S70 = ee = = 0.95 
y=x2dv?  ¥8041 x 224 68096 26096 


Illustration 11. 


Find out correlation coefficient from the following data : 
XY 


No. of items 10 10 
Mean 62 64 


Sum of squares of deviations from mean 47 49 


Summation of products of deviations from mean = 24 
Solution : 


Given: N =10 
X = 62,Y =64 


a: a 
3(Y-Y) =Dy* = 49 


(x - X) (¥ - Y) = Day = 24 


24 


2 faixa5 


Illustration 12. 


Karl Pearson's correlation coefficient ( r ) between X and Y series is — 0.75. 
Their covariance is — 15. If the variance of X -series is 25, then find out standard 
deviation of Y -series. 

Solution : 

Given : Correlation coefficient, 

r=-0.75 


Variance of X- series =».°=2+0.-3 


X(X-xX)}(¥-¥) 


Y(x-X)Py(v-FY 


Co-variance 


Cx'0,, 


Substituting the values, 
-15 


Illustration 13. 
From the following table, calculate the coefficient of correlation by Karl 
Pearson’s method and taken deviation from actual mean : 


X?21048 
Y911?87 
Actual mean of X and Y series are 6 and 8 respectively. 
Solution : 
First of all missing values of X and Y series will be find out : 
(i) Calculation of missing value [as (a)] of X-series : 


« oe 

A = 
N 

Here x-s 


XX =a+24+104+44+8 
_at+Z2+10+4+8 
7 5 
6x5=a+24 

a =30-24=6 


6 


Missing value of X series is 6. 
(ii) Calculation of Missing Value of Y [as (b)]: 
=e = 
2Y =8x5=40 
Missing Value = 40 -(9+ 11+8+7)=5 
(iii) Calculation of Correlation Coefficient of Karl Pearson : 
X x-6 AX 2 Y v-s dy 2 dx:dy 
dx =(X-X) dy = (Y- 7 ) 
6009110 
2-4161139-12 
104165-39-12 


4-—248000 
8247-11-2 


N = 5 dx = O Ydx” = 40 N = 5 dy = 0 Xdy? = 20 dx:dy == 26 


There is high degree of negative correlation between x andy . 


Illustration 14. 

The following table gives the distribution of the total population and those who 
are wholly or partially blind among them. Find out if there is any relationship 
between age and blindness. 


Age 0-10 10-20 20-30 30-40 40-50 
No. of Persons 100 60 40 36 24 
Blind Persons 55 40 40 40 36 
Solution : 
Note : In order to make the data comparable it is necessary to find 
out the number of blinds out of a fixed number (a common unit). 


Here this unit will be one lakh. 
For calculating variable "Y' (Blindness), we can use the following 
formula : 


Blind Persons 
: Y ' = No.of Persons = 


55 40 
— —x100=55 =—x100=67 
= 00 55 60% 37 


= *» 100 = 100 
40 


Y= sae *100=111 
36 


Age Series m.v. ( X ) Ax =5 dx 2 Blind per Ay = 55 dy 2 dx:dy 
dx = ( X — Ax) lakh ( Y ) dy =( Y-Ay ) 

0-1050055000 

10—20 15 + 10 100 67 + 12 144 120 

20-30 25 + 20 400 100 + 45 2025 900 

30—40 35 + 30 900 111 + 56 3136 1680 

40-50 45 + 40 1600 150 + 95 9025 3800 


N =5 + 100 3000 N = 5 208 14330 6500 


Sdedy: N -(S ardy) 


Sd" -N -(Sdxy¥ l da?y-N-(Sdyy | 


= 6500 « 5 — (100 « 208) 32500 — 20800 
{{ s000 xB-( 100)" |/1 4830 5- (208 | ~ [15000 — 10000]]71650 — 43264] 
11700 11700 


0.982 


5000 283886 -11913.44 


Illustration 15. 

From the following data, find out if there is any relationship between density of 
population and death rate : 

District Area (in sq. kilometre) Population No. of Deaths 

A 120 24000 288 

B 150 75000 1125 

C 80 48000 768 

D 50 40000 720 

E 200 50000 650 


Solution : 


In this question, the correlation between population density and death rate is to 
be determined. So first population desity and death rate will be find out. The 
population density per kilometre will be calculated by dividing the population by the 
area of each district. The death rate will be determined by the formula 

No. of deaths x 1000 


Population of the Distriet , The point is to noted that percentage of population statistic will be 
determined per thousand. 
District Density Ax = 500 dx 2 Death Ay = 15 dy 2 dx-dy 
X dx =( X — Ax ) Rate Y dy =( Y-Ay ) 
A 200 — 300 90000 12 —-3 9 900 
B 5000015000 
C 600 + 100 10000 16 + 1 1 100 
D 800 + 300 90000 18 + 3 9 900 
E 250 — 250 62500 13 — 2 4 500 


N = 5 2350 — 150 252500 — 1 23 2400 


NS da dy - (> daddy } 


kant = 


NSd?x -(Sdx¥ || NSa2y-(Say¥ | 


5 x 2400-(- 150-1) 
%252500-(— 150) [5 x23 -(- 1)°] 
12000 —150 11850 x 
= NB 11850 _ 9 066 
(1262500 — 22500)(115 —1) 1240000 x 114 11889.49 


There is high degree of positive correlation between x and y . 
Illustration 16. 

Given : 2x-X) =360,217 -¥)? = 260 

XLX — XY -¥)= 225, N =12 

Find out coefficient of correlation between X and Y . 
Solution : 

Given ¢ 2-5? =27%s=360 


— 


XY —Y¥ =Xd*v = 250 


DX -XVY -Y)= Lax dy = 225 


Sam 295 
r= Ddedy  _ _—2 295-225 0 
Tey) 0x2 = — = — =0.75 
ySd xwYd®y 860x260 =) 555 = a0 


There is high degree of positive correlation between x and y . 
Illustration 17. 


Y:Y=18,0, =8.51,N =10 
Ydx.dy = +2350 


Compute coefficient of correlation between X and Y . 


Solution : 
Sdedy 2350 2350 : 
_ _ _ baa 
Neo,o, 10x2872x851 2444072 


There is high degree of positive correlation between x andy . 
Illustration 18. 

The following statistical measures are obtained to calculate correlation 
coefficient between 12 values of items of X and Y data: 

5X =30,0 Y=5, 0 X* =670, 0 Y* =285, 1 XY = 334 

But it was, however, discovered at the time of checking that two pairs of 
observations of X and Y were wrong written as X = 11 and Y = 4 while correct 
values were X = 10 and Y = 14. So find out correct correlation coefficient. 


Solution : 


For finding correct Karl Pearson’s correlation coefficient it is 
essential to correct the wrong given values in the Question : 


Correct ) X = 30-11+10= 29 

Correct |) Y=5-4+4+14=15 

Correct (1 X 2 = 670-112 +102 = 649 

Correct ) Y 2 = 285-42 +414 2 = 465 

Correct [ XY = 334 — (11 x 4) + (10 x 14) = 334 — 44 + 140 = 430 
n=12 


Correlation coefficient, 


Berm OSS (03) 
zs N 


i igaye . UY) 
px? se 


aes “Tee amt = 0.775 
Exercise (A) 

1. Find the coefficient of correlation of X and Y : 

X 58 53 41 39 43 46 43 45 41 47 45 44 

Y 11 27 31 42 30 28 28 20 19 20 32 30 

[Ans. r =— 0.648 ] (Jiwaji 2005) 


2. Find the coefficient of correlation of lower age of husband and wife : 
Age of Husband 23 27 28 29 30 31 33 35 36 39 


Age of Wife 18 22 23 24 25 26 28 29 30 32 

[Ans. r = 0.99 ] 

3. Find the Karl Pearson's coefficient of correlation from the following value of X 
and Y: 

X 78 89 97 69 59 79 68 61 

Y 115 137 156 112 100 136 123 108 

[Ans. r = 0.909 ] 

4. Find the Karl Pearson's coefficient between goods price and supply; from the 
following data : 

Price 17 18 19 20 21 22 23 24 25 26 

Supply (kg) 38 37 38 33 32 33 34 29 26 23 

[Ans. r =-—0.922 ] 

5. Describe the Karl Pearson's coefficient method and calculate the coefficient of 
correlation from the following data : 

X 10 12 14 18 20 24 28 30 32 34 

Y 567 10 12 15 18 20 21 22 

[Ans. r = + 0.998 ] 


6. Find the coefficient of correlation between x andy . 

x123456789 

y 121113151417 16 19 18 

[Ans. r = 0.933 ] 

7. Calculate Karl Pearson's coefficient of correlation from the following data, 
using 20 as the working mean for price, and 70 as the working mean for 


demand : 
Price 14 16 17 18 19 20 21 22 23 
Demand 84 78 70 75 66 67 62 58 60 


[Ans. r =— 0.954 J 
8. Find out the coefficient of correlation between the sales and expenses of the 


following 10 firms : 
Firms 12345678910 
Sales 50 50 55 60 65 65 65 60 60 50 
Expenses 11 13 14 16 16 15 15 14 13 13 


[Ans. r = 0.787 ] (Vikram 2005) 
9. From the following table calculate the coefficient of correlation : 
x 57 42 40 38 42 45 42 44 40 46 44 43 


y 10 26 30 41 29 27 27 19 18 19 31 29 
Take 44 and 26 as assumed means. 


[Ans. r =—0.7327 ] 


10. Find the coefficient of correlation to plant utilisation in a industry : 
Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 


Plant capacity x 26 28 30 30 31 32 38 49 54 60 62 60 

Plant utilisation y 20 22 26 25 24 28 30 39 48 50 56 48 

[Ans. r = 0.988 ] 

11. The following table gives the results of the Higher Secondary Examination 
held in 2010. Calculate the coefficient of correlation and estimate the P.E. From 


your results can you definitely asset that failure is correlated with age : 
Age of candidate 13-14 14-15 15-16 16-17 17-18 18-19 19-20 20-21 21-22 


% of failure 39.2 40.6 43.4 34.2 36.6 39.2 48.9 47.1 54.5 

[Ans. r = 0.68, PE. = 0.12, since correlation coefficient is less than 6 times of 
probable error hence we can’t say definitely assert that the failure is correlated 
with age. 

12. With the following data in 6 cities calculate the coefficient of correlation of Karl 


Pearson's method between the density of population and the death rate : 
Cities Area (in sq. miles) Population (in '000) No. of death 
A 150 30 300 
B 180 90 1,440 
C 100 40 560 
D 60 42 840 
E 120 72 1,224 
F 80 24 312 


[Ans. r = 0.988 ] (Ravishankar 2006) 
13. Find out Karl Pearson's coefficient between age and literacy from the 


following data : 
Age group Total population (in '000) Literate population (in '000) 
10-20 120 100 
20-30 100 75 
30-40 80 80 
40-50 60 60 
50-60 40 25 
60-70 15 10 
70-80 8 4 

(Bilaspur 2006) 


[ Hint : (i) Find the literacy be finding percentage of literates from the total 


population. 
(ii) Find correlation coefficient between age and literacy. 


[Ans. r =— 0.621 ] 
14. Following distribution have total population and blind or partly blind. Find out 


coefficient between blindness and age : 
Age 0-10 10-20 20-30 30—40 40-50 50-60 60—70 70-80 
Population (in '000) 100 60 40 36 24 1163 


Blinds 55 40 40 40 36 22 18 15 

[Ans. r = 0.898 ] (Jabalpur 2008) 

[Hint : Calculate in it the number of blinds in a fixed population, say, 1,00,000 as 
follows : 

» on 60,000 population in age group 10-20, the no. of blinds is 40 

. on 1,00,000”” willbe ~~" 

In the same way calculate all on 1,00,000. It will be Y- series. Age will be xX- 
series. Find its mid value. 

15. Find Karl Pearson's correlation coefficient between the age and playing habit 


of the people from the following information : 
Age group (years) No. of people No. of players 
15-20 200 150 
20-25 270 162 
25-30 340 170 
30-35 360 180 
35—40 400 180 
40-45 300 120 


[Ans. r =— 0.9395 J 


16. Find out if there is any relation between age and illiteracy from the following : 
Age group 10-20 20-30 30-40 40-50 50-60 60-70 70-80 


Total population (in thousands) 120 100 80 50 25 155 
Illiterate population (in hundreds) 100 75 60 30 20 10 5 
[Ans. 0.243 ] 


> Population 


[Note : Literate population per thousand ( Y -series) =“qualiepnecae 
17. Following figures give the rainfall in inches for the year and the production in 
Tonnes for Rabi Crop and Kharif Crop. Is there any correlation between 


production and rainfall ? 
Rainfall 20 22 24 26 28 30 32 
Rabi production 15 18 20 32 40 39 40 
Kharif production 15 17 20 18 20 21 15 


[Ans. : Correlation coefficient between rain and Rabi is 0.95 and 
between rain and Kharif is 0.025. ] 


[Hint : Here you have to calculate separate correlations two times with rain ] 


CORRELATION IN GROUPED SERIES 


Pearson's correlation coefficient can be also calculated in grouped 
series like individual series, but correlation table is needed in 
grouped series. Correlation table relates to bivariate frequency table. 
When the number of observations is sufficiently large, the data are 
Classified into two-way and then correlation table is constructed. In 
this table, the values of one variable ( X or Y ) are shown in rows 
and the value of other variable are shown in column of table. The 
frequencies of each class are noted in the constructed square cells 
of the table. The total number of square cells in correlation table is 
equal to the product of rows and columns m x rn. The total 
frequencies of each class are shown in marginal row and marginal 
column. The inner square cells under the table shows the distribution 
of frequencies of each row in their columns. Similarly it tells how the 
frequencies of different rows are distributed in each column. The 
frequencies written in each square cell are related to both variables. 


This fact is clear from the following table : 
Correlation Table 

Marks in Marks in Economics Total 
Statistics 5-15 15-25 25-35 35-45 
0-1011--2 
10-20365115 
20-30 189220 
30-40 -39315 
40-50--448 


Total 5 18 27 10 60 
In the above table, a general and minute analysis of marks 
obtained by 60 students in Statistics and Economics is done. In the 


table, the frequency of each cell is related to the both variables, /.e. , 
marks in Statistics and Economics. For example, it is clear from the 
frequency | written in cell of first column of first row that there is one 
student whose marks in Statistics is between 0-10 and marks in 
Economics is between 5-15. It is clear from the table that there are 
total 5 students having marks between 5-15 in Economics. Out of 
these 1 student obtained between 0-10, 3 students obtained 
between 10-20, 1 student obtained between 20-30 marks in 
Statistics. There are 18 students in class of 15-25 marks in 
Economics in which 1 student is between 0-10 marks, 6 students are 
between 10-20 marks, 8 students are between 20-30 marks and 3 
students are between 30-40 marks in Statistics. It is clear from the 
table that the students who have obtained more marks in Statistics, 
have also obtained more marks in Economics. Hence there is positive 
correlation between two but there is no idea about the degree of 
correlation from the table. We have to calculate the correlation 
coefficient to find the degree of correlation between two variables. 

Calculation of Karl Pearson’s correlation coefficient in 
grouped series : We have to do the following works to find Karl 
Pearson's correlation coefficient in grouped series : 

(1) We make extra 3 rows in top and 3 rows in bottom, 3 columns 
in right and 3 columns in left in the table. We write the mid values of 
variable Y , deviation ( dy ) from assumed mean and step deviation ( 
dy ') in the columns of left side and fdy ', fdy 2 and fdx'dy ' 
respectively in the column of right sides. 

(2) The deviation of mid values should be determined by assuming 
anyone mid value of the mid values in X and Y series as the 


assumed mean. If the width of class are same, the step deviation of 
these deviations on the basis of usual class interval should be taken. 

(3) Multiplying together the deviations of X and Y series, we write, 
obtained multiplication, dx-dy with their signs in the left hand upper 
comer of each cell. There is its no need where no frequency is in the 
cell. 

(4) To calculate fdx-dy, we calculate the product of dx-dy written in 
left side of each cell with the frequency of that cell and write the 
product obtained in the right side lower corner of frequency cell. The 
method to calculate fdx-dy is as follows : 

(5) Multiplying the deviations of X and Y by total frequencies of 
related row or column we calculate fdx and fdy respectively then we 
add all fdx with their signs of the row having fdx and add all fdy with 
their signs of the column having fdy and thus obtain | fdx and 1 fdy 


(6) Multiplying fdx by dx and fdy by dy again, we write the product 


fdx 2 and fdy 2 in row and column. Then we obtain (| fdx 2 and 


fdy by adding fdx 2 row and fdy 2 column. 


(7) To obtain (i fdx-dy we add () fdx-dy written in right side lower 
corner according to row and write it in the last column of each row 


where fdx-dy is written. At last we should add this fdx-dy’s column. 


Thus we obtain | fdx-dy by it. Similarly, adding fdx-dy written in right 


side lower corner according to each column, we should write it in the 
last row of each column. At last we should add this fdx-dy’s row and 
obtain | fdx-dy. It is noted that the sum of last row | fdx-dy should 


be equal to sum of last column () fdx.dy in right side. 
(8) Total of frequencies is the number of items. 


(9) At last we use the following formula to find Karl Pearson’s 
correlation coefficient : 


<a a fae say) 
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From step deviation method : 
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Illustration 1. 

Calculate coefficient of correlation between the ages of 100 couples of 
husbands and their wives : 

Age of Husbands X -series 


Age of wives Y series 20—30 30-40 40—50 50-60 60-70 Total 

15-25 593—— 17 

25-35 — 10 25 2 — 37 

35-45 —1122—15 

45-55 —— 416525 

55-65 — —— 426 

Total 5 20 44 24 7 100 

(Ravishankar 1997) 


Solution : 
Calculation of Coefficient of Correlation 


— Age of Husbands 20-30 30—40 40-50 50-60 60—70 


| X mv, 25 35 45 55 65 


dx — 20 — 10 0 10 20 Total fay’ fy’ * fax'-dy' 
Age of Wives dx '-2-1012(f) 

Y 

m.v. dy dy' 

420 

15-25 20 -20-2593-- 17-34 68 38 
(20) (18) (0) 


10-1 

25-35 30 - 10- 1-10 25 2-37 -37-378 
(10) (0) (—2) 

000 

35-45 4000-1122-15000 

(0) (0) (0) 

012 

45-55 50 10 1--4 16 5 25 25 25 26 
(0) (16) (10) 

24 

55-60 60 20 2---—4 26 12 24 16 
(8) (8) 

Total f 5 20 44 24 7 100 — 34 154 88 
fdx'— 10-20 0 24 148 


fdx' 2 20 20 0 24 28 92 


fdx'-dy' 20 28 0 22 18 88 


(8) 
TAGES, = gait 


Coefficient of Correlation r = 0.795 


Illustration 2. 
Find out Karl Pearson's correlation coefficient from the following table : 


X -Series 

Y -Series 0-5 5-10 10-15 15-20 20-25 
0-4 12 

4-8438 

8-12 4 

12-16 2 1 


Solution : 


— X -Series X 0-5 5-10 10-15 15-20 20-25 
mv. 2.5 7.5 12.5 17.5 22.5 


{| dx —10—505 10 Total fay ' fdy' 2 fdx'-dy ' 
dx'-2-1012f 

Y- Series 

Y m.v. dy dy' 

21 

0-4 22-4-112---3-334 
(2) (2) 

000 
4-8600-458-17000 

(0) (0) (0) 

01 


§1210412-34-7774 
(0) (4) 
24 


12-16 1482---2136128 
(4) (4) 

Total f 168141 30 10 22 16 
fdx'-2-601428 

fax'2 46014428 
fdx'-dy'2208416 


x-12.5 y-6 


Here are, dx : — 5 ,dy ae a 
Correlation coefficient, 


(Sfdx'\ fy’) 


Di (de“dy') A= 


13 


Exercise (B) 
1. Find out coefficient between husband age and wife age from the following 
table : 


Age of Husbands Age of Wives Y 


X 20-25 25-30 30-35 35-40 40-45 Total 
25-30 3----3 


30-35 —-22--4 
35-40-53 3-11 
40-45--34-7 
45-50--1-45 
Total 3797 4 30 
[Ans. r = 0.81 ] 


2. Find out Karl Pearson's coefficient of correlation from the following table : 
x 


Y 18 19 20 21 22 Total 
0-531 4 

5-10325 

10-15 7 10 17 
15-20549 

20-25 325 


Total 37 11 163 40 

[Ans. r =— 0.837 ] 

3. 100 students of Account and Statistics were examined whose marks were 
classified as under. Find out the relationship of both subjects knowledge : 


Statistics Account 
20-25 25-30 30-35 35—40 
15-25 20 10 2 — 

25-35 4 24 6 — 

35-45 — 5 112 

45-55 —59 1 

55-65 — — — 1 

[Ans. r = 0.669 ] 

4. Calculate coefficient of correlation from the following data : 
y-3-2-10+1+2+3 
x 

—310 

—21668 


—118148 
0410186 


+14612 
+266 
+38 


[Ans. r =— 0.86 ] 
5. Calculate coefficient of correlation between the age of husband and wife from 


the following distribution table : 
Age of Wife 


Age of Husband 15—25 25-35 35-45 45-55 55-65 
20-30 5---—- 

30-40 9 10 1-- 

40-50 3 25 12 4 — 

50-60 — 2 2 16 4 

60-70 ---52 


[Ans. r = 0.795 J 


4. Spearman's Rank Difference Coefficient of 


Correlation 

A famous English Psychologist Charles Edward Spearman 
introduced a new method in 1904 to find correlation in an individual 
series. We call it Spearman’s rank difference method or ranking 
method. . 

This method is used in those situations where statistical figures 
may not be measured directly in the form of quantity. For example, 
intelligency, beauty, honesty, health, character etc. are the qualitative 
facts which may not measured directly in the form of figures. If a 
foreman wants to promote his workmen or assistants, he will have to 
put them in order to first, second, third, fourth etc. according to their 
efficiencies. Foreman can’t measure directly the efficiency of his 
assistants. The correlation coefficient can be calculated by 
Spearman’s rank difference method on the basis of ranks given to 
the assistants according to their efficiency. 


Method to calculate coefficient of correlation by Spearman’s 
rank difference : At the time of calculation of correlation coefficient 


under rank difference method, the two main problems arise : 
(1) When the ranks of item values in individual series are given. 


(2) When the ranks of item values in individual series are not 
given. 


(1) When ranks of individual item values are given : When 
ranks of individual item values are given, the following works should 
be done to find correlation coefficient : 

(i) Find the difference between the ranks of items in the two series. 
These differences are denoted by D. The sum of difference | D = 0. 


(ii) Find the sum of squares of the differences between the ranks. 


Thus we obtained (| D 2 : 


(iii) The following formula of Spearman should be used to find the 
correlation coefficient : 

“aay of ES 

In this formula, 


r s = Spearman’s rank difference correlation coefficient 


D 2 = Sum of squares of the difference between the ranks of 
items in the two 


series, 

N = Number of items 

(iv) Spearman’s rank difference correlation coefficient also lies 
between — 1 and + 1 like Karl Pearson’s correlation coefficient and 
its explanation is done like Karl Pearson’s correlation coefficient. In 
other words, if rank difference correlation coefficient ( r , ) is more 
than 0.75 but less than 1 there is high degree of correlation, but 
when it is less than 0.5 there is moderate correlation and when it is 
less than 0.5 there is low degree of correlation. 


Illustration 1. 
Before the training course and after that 10 students are given rank in the 
following ways : 


Students ABCDEFGHIJ 


Ranks before Training 16395271084 


Ranks after Training 68372159410 
Find out Spearman's coefficient of correlation. 


Solution : 

In above question the ranks of ten students before training and 
after training are already given, Spearman’s correlation coefficient 
will be determined under first method. 

Calculation of Spearman’s Correlation Coefficient 

Ranks Ranks Defference between Square of rank 


Students Training Training two ranks ( R 4 — FR 2 ) difference 
R4R2(D)(D)* 
A16-525 
B68-24 
C3300 
D97+24 
ES2+39 
ea a 
G75+24 
H109+11 
184+416 

J 410-6 36 


N=100 D2 =100 
‘ee D? 
N° =N 


In this formula, 


s 


r gs = Spearman's rank difference correlation coefficient 


D 2 = Sum of square of rank differences of both the series 
N = No. of paired items 


Substituting by the data, 


pn =1- 6x 100 - 600 


10° -10 ~ 1000-10 


600 
990 
r 5 = 0.394 


=1 = 1-—0.606 


Illustration 2. 

In a beauty contest 10 competitors are given rank by three judges in the 
following order : 

Ranks given to 10 competitors 


First Judge 15489610732 
Second Judge 48765910321 


Third Judge 67815109234 
Using Spearman's rank correlation determine which pairs of judges have the 
nearest approach to common taste beauty ? 


Solution : 
Table 


R1R2R3R4-RQRQ-RZR1-R3Z(R1-RQ)*(R 
2-R3)*(R1-R3)° 


D4D29D3D 17D 7*D3" 
146-3-2-59425 
587-341-2914 
478-3-1-491 16 
861+2+5+742549 
955+40+416016 
6910-—3-1-49116 
101090+1+1011 
732+44+1+5161 25 
323+1-10110 
214+1-3-2194 


N=100D4,°%=740D9%=440D 3% = 156 


Applying the formula of Spearman’s rank difference 
correlation coefficient, 


,_ 822: 
N a N 


Substituting the values, 
(i) Correlation coefficient between the ranks given by first and 


bax 


second judges. 


7 14X6 
1" 197 40 
ee 

990 


= 1 - 0.448 = 0.552 
(ii) Correlation coefficient between the ranks given by second 
and third judges. 


6x44 
ia. 3 
- 10° -10 
_ 264 
1000 - 10 
) Te 
Hn 264 
990 


= 1 -— 0.267 = 0.733 
(iii) Correlation coefficient between the ranks given by first 
and third judges. 


6x 156 936 
r=1- 3 =1-— 
10° -—10 990 


= 1-—0.945 = 0.055 


Ans. : The rank difference correlation coefficient between the 
ranks given by second and third judges is the maximum. Hence 
second and third judges have the nearest approach to common taste 
in beauty. 

(2) Calculation of rank difference correlation coefficient when 
the ranks of items in the series are not given : When the ranks of 
items in the series are not given, we have to do the following works 


to find the rank difference correlation coefficient : 

(i) Ranks are given to item values in X and Y series according to 
their importance. The highest item value is given the rank 1, the next 
highest item value is given rank 2, the next highest item value is 
given rank 3 and so on. Contrary to it, the smallest item value is 
given rank 1, the next smallest value is given rank 2 and so on. 
Ranks should be given in both the series either by taking the highest 
value as 1 or the lowest value as 1. 


(ii) In case of a tie : If the values of two or more items in the 
series are the same, the rank in such a case is determined by taking 
the average of the ranks which these items would have occupied 
had they differed slightly from each other. Next rank is given to the 
value after it. For example, if the highest item value is 100 in X- 
series then its rank will be 1. After it, the next item value is 90, its 
rank will be 2. After it there are 70, 70, 70 three item values. The 
their rank will be the average of the ranks of these three item values 
Le, aE 

So rank 4 will be given to each 70. Rank 6 will be given to next 
item value, say, 65. 


(iii) Subtracting the ranks of item values in X and Y series 
mutually, the difference of ranks ( D ) should be obtained. 


(iv) Squaring the difference of these ranks, their total should be 


calculated. Thus (1 D 2 will be obtained. 


(v) At last Spearman's rank difference correlation coefficient 
should be used : 


5) 
6> D° rz =1- 63D 


Formula ; x=») of (N=) 

(iv) When the same ranks are given to two or more than two item 
values in the series then an adjustment is made in the above given 
formula to find rank correlation coefficient. 

The corrected formula will be as follows : 


| = [> DY + 5 ( mi my + = (nas mo )+ | 


ie 


ef DP + 0 (m® —m)+40n9 -—m)+ 


7; =1 


where m is the number of times that an item is repeated. 

Note : This correction factor »"-” is added for each repeated value 
in both series. 

Illustration 3. 
For the following data calculate correlation coefficient by rank difference method 


X 78 89 97 69 59 79 68 57 
Y 125 137 156 112 107 136 123 108 
Solution : 


X Y Rank of X (R41) Rankof Y(R7)D=R4-R 7D? 


781254400 
89 1372200 
971561100 
39 11256-1 1 
59 107 78-11 
79 1363300 
68 12365 1 1 
5/7 1088711 


Total) D=00D2=4 
Correlation coefficient 


-5p-87 a = 1 —0.05 = 0.95 


Illustration 4. 
Find out rank coefficient of correlation from the following data : 


X 48 33 40 9 16 16 65 25 15 57 


Y 1313 2461542096 19 
(Indore 2004) 


Solution : 
Calculation of Spearman’s rank difference correlation coefficient 


XR1YR2R1-Ro(R4-R2)? 


DD? 

48 31355-25625 
33 5135.5-0.50.25 
40 4241+ 3.09.00 
91068.5+ 1.52.25 
167.5 154+ 3.5 12.25 
167.54 10—-2.56.25 
65 1202-1.0 1.00 
25697 -1.0 1.00 
15968.5+0.50.25 
57 2193-—1.0 1.00 


N=10 0 D2 =39.50 
Applying the formula of Spearman’s rank difference correlation 
coefficient, 


ab) DY + + On? —m)+ je (m* m+ | 


aya NT 


r,=1+ 


Here m is the frequency of item which has come more than one 
time. 
Substituting the values 


cael 6[ 39 5 +28 - 2)+,(28 - 2)+ B02? - 2)] 


(107 — 10) 


G[39.5+0.5+0.5+0.5] _ | 6(41) 


ia S = 1000-10 990 


2 


= 1-8 1- 0.248 0.752 


Illustration 5. 
From the following data calculate Spearman's rank coefficient of correlation : 


Serial number 12345678910 
Rank difference —2-—4-1+3+20?7+3+3-2 


(Indore 2004) 


Solution : 
Let unknown rank difference by x. 


Sum of rank difference is always zero, i.e., (|) D =0 
or—2-4-1+34+2+0+x+3+3-2=0 

—-9+11+x=0 

2+x=0 

x =-2 

Thus unknown rank difference should be — 2 

So D2 =44+164+1+94+4+0+44+9+9+4=60 


; 7) 
b>: 
i =—1 a 
TQ) T 
N com | 
= 6x60 360 360 
~"~y0?=10 =!" q000-10 = '~ 990 
= 1- 0.36 = 0.64 


Demerits and Merits of Spearman’s Rank Difference Coefficient 


of Correlation Method 
Merits : 


(1) Spearman’s rank difference method is very simple as 
compared to Karl Pearson’s correlation coefficient method. The 
reason behind it is that we have to do hectic multiplication and 
division to find covariability in Karl Pearson’s correlation coefficient 
while we make this work very simple by finding ranks in Spearman’s 


rank difference method. 


(2) It can be used to measure the degree of association between 
two attributes like intelligency, honesty, efficiency, health, beauty etc. 
but it is essential for it to arrange each individual in a rank. 

(3) If the real value of each item in the series is given then the 
degree of correlation can be also determined by rank difference 
correlation method. For it, ranks are given to the values according to 
their importance. 

(4) This method may be also used conveniently when the data are 
given irregular in the series. The reason is that the rank difference 
coefficient of correlation is not based on the assumption of normality 
of data. 

Demerits : 

(1) This method can be used only in individual series. This can’t be 
used in classified frequency distribution. 

(2) Correlation coefficient is not calculated on the basis of actual 
value under rank difference method. Hence the result obtained by 
this method tells the tendency not the real correlation. 

(3) When one value in the series repeats a number of times then 
calculation work becomes a little difficult due to being equal rank. 

(4) When the number of paired items exceeds 30 the calculation 
work become difficult and tidious. It requires a lot of time. Hence this 
method is used in the limited area. 


5. Coefficient of Concurrent Deviations 

Concurrent deviation method is the simplest method to find 
correlation coefficient. This method is used to find the direction of 
correlation. It does not give the knowledge about the degree of 
correlation. 


Method : The following works have been done to find correlation 
coefficient in this 
method : 

(1) The direction of the changes of values of X -variable is 
determined. To find the direction of changes of X -variable, the first 
value is compared with the next value. If it is increasing, we put plus 


(+) sign. If it is decreasing, we put minus (—) sign. If both values are 
same, we put equal sign (=). 

(2) In the same way the direction of changes in Y -variable is 
determined. The deviation signs of X -variable are denoted by dx 
and the deviation signs of Y -variable are denoted by dy . 

(3) Multiplying the deviation signs of X and Y variable together, 
the product dx.dy is obtained. We put (+) for (+) x (+), (+) for (—) x (- 
1), (-) for (+) x (—) and (—) for (—) * (+) in product column. 

(4) To find the number of concurrent deviation or c , we count the 
plus (+) signs in the product column ( dx.dy ). 


(5) Lastly, we apply the following formula to find correlation 
coefficient by concurrent deviation method : 


Formula, eos | 
Explanation of the formula : 


2C-N 


Here, r ¢ = Correlation coefficient of concurrent deviation 


C = Number of concurrent deviation 

N = Number of paired deviations. This number will be 1 less than 
the total number 
of items. 

Note : If the quantity ~» is negative then its square root can’t be 
determined. Multiplying this negative quantity by a minus (—) sign, 
the square root is determined and a minus (—) sign is written before 
the correlation coefficient. That is why the sign ( + ) is written before 
and after the sign of square-root. . 


Illustration 6. 
Find out coefficient of correlation through concurrent deviation method from the 
following data : 


Price 368 384 385 361 347 384 395 403 400 385 
Import 22 21 24 20 22 26 24 29 28 27 


Solution : 
Calculation of Correlation Coefficient through Concurrent 
Deviation Method 


X - Variable Y - Variable Product of 


Price Deviation sign Import Deviation Deviation sign 
x dx y dy dx-dy 
368 22 

384 + 21 -- 
385 + 24+ + 
361 -—20-—+ 
347 — 22 +- 
384 + 26+ + 
395 + 24 ——- 
403 +29++ 
400 —28-+ 
385 —27-—+ 


N=9N=9C=6 
Applying the formula of concurrent deviation correlation coefficient. 


Ans. : There is moderate degree of positive correlation between 
price and demand. 


Illustration 7. 

Find out coefficient of correlation through the concurrent deviation method from 
the following data : 

Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 

Supply Index No. 112 125 126 118 118 121 125 125 131 135 

Price Index No. 106 102 102 104 98 96 97 97 95 90 


Solution : 
Calculation of correlation coefficient through concurrent 
deviation method 


X- Variable Y- Variable Product of 
Year Supply Deviation Price Deviation Deviation 


Index No. sign Index No. sign signs 
x dx y dy dx-dy 
2001 112 106 
2002 125 + 102 —— 
2003 126 + 102 = 
2004 118 — 104 + — 
2005 118 = 98 — 
2006 121 + 96 —— 
2007 125+97++ 
2008 125 = 97 = + 
2009 131 + 95 —— 
2010 135 + 90 —— 


N=9N=9C=2 
Applying the formula of concurrent deviation correlation coefficient, 


=2("5 |=242l-5) = +J=(— 0.5556 


pee) 


Note : Since |» 
sign for finding its square root and put a minus (—) sign before the 
sign of square root. 


——— 


m= (-) ¥(-)(-0.5556) =-0.5556 


is negative hence we shall multiply it by minus (—) 


r. = —0.745 


Ans. : There is high degree of negative correlation between 
supply and price. 

Merits and Demerits of Concurrent Deviation Method 
Merits : 

(1) This method is the simplest method to find correlation. 

(2) When the number of items is very large, this method is used to 
find a quick idea about the direction of correlation between them. 
Demerits : 

(1) It tells about rough correlation between two variables. 

(2) This method does not differentiate between big and small 
changes. For example, the value of X increases from 100 to 1000 
then we will put a plus (+) sign for this deviation. Similarly if the value 
of Y variable increases from 100 to 101 then we will put a plus (+) 
sign for this deviation. Thus their concurrent deviation will be 
denoted by plus (+) sign, but the magnitudes of these deviations are 
not considered. 

(3) It tells only about the direction of changes. It does not study its 
degree due to above demerits, this method is not much used. 


6. Least Square Method 


In least square method, correlation coefficient is calculated on the 
basis of line of best fit. The best plausible values of Y -series are 
calculated for the values of X -series in this method and correlation 
coefficient is calculated on the basis of these values. The calculation 
method of correlation coefficient is as follows : 

(1) First of all, the best plausible values ( y ) of Y -series is 
calculated for given values of 
X -series with the help of straight line Y = a + bx . For it, the values 
of a and b are found with the help of the following two normal 
equations : 

L¥ = Nat+bdX 


UXY = adX +brX2 


where, Y = Total of all items in Y -series 


N = Number of items in the series 


X = Total of all items in X -series 


XY = Total of product of items of X and Y series 


x 2 = Total of square of items in X -series 


Substituting these values from the table, the values of a and b are 
determined by solving two simultaneous equations. 

(2) The value of a and b obtained by above method are substituted 
in the equation : 
Y = a + dX. This formula is applied for all values of X 
respectively. Thus computed values of Y ( /.e., Yc ) are obtained 
for all values of X . 

(3) After it substracting above computed values from actual values 
of Y -series, the deviation ( Y— Y < ) for each value is obtained. 


(4) After squaring these deviations, the sum of squares (| (Y—Y ©) 


2 is determined. 
(5) Arithmetic mean of squares of deviation is determined from the 
following formula : 


s. Seay 


sy = NI 


This is the measure of variation of the line of best fit. It is also 
called Unexplained Variance. 


(6) After it, the variance of Y -series is calculated with the following 


formula : 
: xv -¥)) 


' d® 


of OF On 
(7) Lastly, correlation coefficient is calculated with the help of the 
following formula : 


(8) In the above calculation the sign of r will be the same as the 
sign of 6 that is if b is positive then r will be also positive and if b 
is negative then r will be also negative. 


Illustration 8. 


From the following series calculate coefficient of correlation by the method of 
least square method : 


X12345 
Y 82 91 70 89 168 


Solution : Calculation of estimated values ( y ; ) of y 


XYXYX - Estimated value of y= Y > 


Applying the formula Y = a+ bx 
1 82 82149+(17 x 1) =66 
291 182 449+ (17 x 2)=83 
3 70 2109 49 + (17 x 3) = 100 
4 89 356 16 49 + (17 x 4) = 117 
5 168 840 25 49 + (17 x 5) = 134 
X=15 0 Y=5000 XY=1670 0X7 =55 
Substituting the values in normal equation (i) (|) Y= Na+bUY 
and (ii) 0XY=alX+bax2 
500=5a+b.15....(i) 
1670 = 15 a + 55 D....(ii) 
Multiplying equation (1) by 3 and substracting if from equation (2), 
1670 = 15 a + 55 D....(ii) 


1500 = 15 a +45 b....(i) 


170 = 10 b 


So b=17 

Substituting the value of b in equation (1), 
500 =5a+17* 15 = 500-225=5a 
a=49 


: 2 
Calculation of S y and y 


XVV GY-Vol(Y¥—-Vo) ~ im os 
d =(y-y) 


1 82 66 16 256 — 18 324 
291 83-8 64-9 81 

3 70 100 — 30 900 — 30 900 
4 89 117 — 28 784 — 11 121 
5 168 134 34 1156 68 4624 


500 Yy- yp)? =3160 YXy-y)? =6050 
_ Sy 500 
y= =.= 100 
N o 
¢ 2 
> dSly-y) 
5, 2_ a o _ — 
a2. 2d 7 ey =1210 
" iv 2 


a Vino = 0.69 
Hence correlation coefficient will be r = + 0.69 
Exercise (C) 
1. The two women are using 7 types of lipstics in the following serial : 
Lipstick ABCDEFG 


Veena 2143576 
Neena1324567 
Find out Spearman Ranking Deviation coefficient of correlation. 


[ Ans. r = 0.66] 


2. The following rank has been allotted to 11 competitors of Fashion Competition 
by two refferies : 


Competitors ABCDEFGHIJK 

First Examiner 12345678910 11 
Second Examiner2316458710119 
Find out Rank coefficient correlation. 


[ Ans. r = 0.909] 

3. Find out Spearman's Rank Deviation coefficient of correlation between X and 
ae 

X 20 22 24 25 30 32 28 21 26 35 

Y 16 15 20 21 19 18 22 24 23 25 

[Hint : (i) Give the ranks to item values of X and Y according to their 
importance e.g., 1st rank to 35, then to 32 etc. 

(ii) After the determination of ranks use Spearman’s rank difference method ] 


[ Ans. r , = + 0.32 approximately] 


4. Find out coefficient of correlation from the following data by using Spearman's 
rank deviation method : 

Supply 160 164 172 182 166 170 178 192 185 

Price 292 280 260 234 266 254 230 190 200 

[ Ans. r , =—0.967 ] 

5. Find out coefficient of correlation by using Spearman rank deviation method 
from the following 
data : 

X 95 88 85 85 78 77 69 60 

Y 20 23 24 24 24 26 25 30 

[ Ans. r , =— 0.92 approximately] 


[Hint : (1) Find Spearman’s rank difference correlation coefficient. 


(2) Since 85 comes two times and 24 comes three times in the series so apply 
corrected 
formula. ] 


6. From the following data calculate Spearman's rank coefficient of correlation : 
Serial Number 123456789 10 


Rank Difference -2-4-1+3+20x+3+3-2 

(Indore 2004) 

[ Ans. 0.64 J 

7. Calculate the rank correlation coefficient for the following data : 

X 92 89 87 86 83 77 71 63 53 50 

Y 86 83 91 77 68 85 52 82 37 57 

[ Ans. 0.73 ] 

8. The ranks of the same students in two subjects A and B were as follows : 
(two numbers within brackets denote ranks of the students in A and B 


respectively). 

(1, 1), (2, 10), (3, 3), (4, 4), (5, 5), (6, 7), (7, 2), (8, 6), (9, 8), (10, 11), (11, 15), (12, 
9), (13, 14), (14, 12), (15, 16), (16, 13). 

Find out if there is any correlation between the two subjects. 


[ Ans. 0.8 ] 

9. The coefficient of rank correlation between marks in Statistics and marks in 
Accountancy obtained by a certain group of students is 0.8. If the sum of square 
of the differences in ranks is given to be 33, find the number of students in the 
group. 

[ Ans. n = 10 J 

10. Ten students who were examined in Cost Account and Corporate Account, 
got the following ranks. Two what extent is the knowledge of the students in the 


two subjects related ? Calculate rank coefficient of correlation : 
Cost Account 12345678910 
Corporate Account24153971068 


[ Ans. 0.76 ] 


11. Calculate the coefficient of concurrent deviation from the data given below : 
Year 2002 2003 2004 2005 2006 2007 2008 2009 2010 

Supply 160 164 172 182 166 170 178 192 186 

Price 292 280 260 234 266 254 230 190 200 


[ Ans. —1 ] 
12. Obtain a suitable measures of correlation from the following data regarding 


changes in price index of two shares X and Y during the year. 
Changes over the previous month 
Jan. Feb. Mar. Apr. May June July Aug., Sep. Oct. Nov. Dec. 


Share X +5+3+2-1-2-3+5-64+1+2+7-3 

Share Y-2+6+3-2-6+1-7+2-3+4+6+2 

[ Ans. 0 (Zero) J 

13. Find out coefficient of correlation by concurrent deviation method from the 
following data : 

X 10 12 13 13 11 11 12 

Y5466856 

[ Ans. 0 ] 

14. Find out coefficient of correlation by concurrent deviation method for the 
following data : 

Years 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 

X 112 125 126 118 118 121 125 125 131 135 

Y 106 102 102 104 98 96 97 97 95 90 

[ Ans. —0.75 J 

15. Find out correlated efficient correlation by coefficient of concurrent deviation 
method : 


Female 12345678910 11 12 

Workshop A 80 70 70 60 90 100 105 110 80 75 100 110 
Production 

Workshop B 


Production 40 35 35 32 40 45 48 50 40 30 35 40 


[Ans.r ,.=+1] 

16. On the basis of the following information find out coefficient of correlation by 
coefficient of concurrent deviation method : 

Year 2002 2003 2004 2005 2006 2007 2008 2009 2010 

Supply 160 164 172 182 166 170 178 178 186 

Price 292 300 340 234 266 254 230 230 180 

[ Ans. r , =-0.5 ] 

17. From the following data calculate coefficient of correlation by using least 
square method : 

x2456811 

y 181210875 

[ Ans. r =—0.92 J 

18. From the following data calculate coefficient of correlation by using least 
square method : 

x13579 

y4677 10 

[ Ans. r =—0.097 ] 

19. Calculate the coefficient of correlation between the values of x and y by the 
method of least 
squares : 

x2345678 

y1234567 

[ Ans. + 11 ] 

PROBABLE ERROR 


Probable error is used to test the reliability of Karl Pearson’s 
correlation coefficient. Probable error is an amount, which is added 
to or subtracted from the correlation coefficient to find its two limits 
within which the correlation coefficient of that series or the 


correlation coefficient between the samples selected by random 
sampling from the population is expected to lie. 
The formula for probable error is : 


1-r 


mn or 


9 es 
PE of r= xi 
3° VN 


Here, P.E. = Probable error 

r = Correlation coefficient 

N = Number of items. 

Functions of Probable Error 

(1) Determination of limits of correlation coefficient : Probable 
error determines the two limits ( r + PE.) of correlation coefficient 
within which the correlation coefficient between the samples selected 
by random sampling from the universe is expected to lie. 

(2) Interpretation of r : The second important work of probable 
error is to test the reliability of correlation coefficient. It is done in the 
following way : (a) If the correlation coefficient ( r ) is less than the 
probable error, there is no correlation between two variables. (b) If 
the value of correlation coefficient ( r ) is more than 6 times the value 
of probable error, it will be understood that correlation is significant. 

Correlation is significant if r> 6 P.E. 


If correlation coefficient is less than 6 times the probable error, 
correlation is insignificant : 


(c) If the probable error is very small and the correlation coefficient 
( r) is 0.5 or more than 0.5, it has been considered to be a fairly high 
degree of correlation. 

(d) If the probable error is very small and correlation coefficient ( r 
) is even less than 0.3, it has been considered to be an insignificant 
correlation. 

(e) If correlation coefficient is less than probable error i.e., r< P.E., 
then it means that there is no evidence of existence of correlation 


between two series. 

(f) If correlation coefficient is more than 0.5 and its probable error 
is very small then the existence of correlation is almost definite. 

(g) Probable error should be used in the interpretation of 
correlation coefficient only when the number of items is sufficiently 
large. If the number of items is small, probable error gives fallacious 
conclusion about correlation coefficient. 

(nh) Probable error should be used to test the reliability of 
correlation coefficient only when sampling method has been used in 
the research. Also it is essential that sample is unbiased, sample 
size is large and sample is representative. 


Illustration 1. 
If RE. = 0.0089 and N = 9, find out the coefficient of correlation and points out 


its significance. 


r G-r?) rar Ga) 
Solution : P.E.=0.6745 w 3.7" Ss 


», 0.0089x3 


= a-r") 0.6745 = 1= r@ = 0.03958 


r2 = 1—0.03958 = 0.9604 


r=0.98 
Test of significance : 6 PE. = 6 x 0.0089 = 0.0534 
Now, 0.98 > 0.0534 


r is greater than 6 times of P.E. So value of r is significant. 


Illustration 2. 

Show by calculation which ‘ r’ is more significant : 

(i) r = 0.75 PE. = 0.083 (ii) r = 0.82 P.E. = 0.068 
Solution : 

Hence, in (ii) r is more significant than (i). 

STANDARD ERROR 

Use of standard error is considered better than that of probable 

error in modern statistics. Standard error is » times of probable error. 


The formula of standard error (S.E.) is 
S.E. of‘ r'= # 

Here, S.E. = Standard error 

r = Correlation coefficient 

N = Number of items 

Limits of correlation coefficient = r . 3 S.E. of r that is, 
Lower limit of correlation coefficient = r—3S.E. 


Upper limit of correlation coefficient = r+ 3 S.E. 


Lower limit + Upper limit 


Correlation coefficient (r)}= mn 


Above fact is clear from the following example. 


Illustration 1. 

To find out correlation between the ages of husbands and wives. 100 couples 
are taken as samples from the universe. The coefficient of correlation was found + 
0.9 between the ages of husbands and wives. In what limits this coefficient of 
correlation will be accepted ? 

Solution : Limits of correlation coefficient = r+ 3 S. E. (Standard 
Error) 

Standard Error (S.E.) = co 

In question correlation coefficient = + 0.9, 

Substituting the values in the formula of standard error, 


, 
1-(0.9)° 
S.E. = —— 
VN 
S.E. = eS OAR 0.019 
10 10 


Stnadard Error (S.E.) = 0.019 
Upper limit of correlation coefficient = r+ 3 S.E. 
= 0.9 + 3 (0.019) = 0.9 + 0.057 = 0.957 


Lower limit of correlation coefficient = r—3 S.E. 
= 0.9 — 3 (0.019) = 0.9 — 0.057 = 0.843 


Upper limit + Lower limit 


Correlation coefficient (r) = = 


0.957+0.843) _ 1.800 
= 7 = =0.9 


It is clear that correlation coefficient ( r ) will lie between these two 
limits 0.843 and 0.957. 

Coefficient of Determination 

Coefficient of determination is the square of correlation coefficient, 
i.e. , coefficient of determination = r 2 , 

It is determined with the help of coefficient of determination that to 
what extent the independent variable is responsible for the change in 
dependent variable. Actually the change in dependent variable is 
due to change in independent variable along with it may vary due to 
some other causes. For example, an increase in demand, results 
from a decrease in price. Attack on nation, unfavourable season to 
consume the commodity, huge reduction in the price of that 
commodity by the rival producer etc. are such factors which may 
affect the increase in demand. Hence the variation in dependent 
variable ( Y ) may be due to two factors : 

(i) Variation in independent variable ( X ) known as explained 
variation. (ii) Variation not due to independent variable but due to 


some other factors known as unexplained variation. 
Hence total variation = Explained variation + Unexplained variation 


The percentage of variation in Y due to X is determined by 
coefficient of degeneration. 


For example, if the correlation coefficient between price ( X ) and 
demand ( Y ) of a commodity is — 0.7, then coefficient of 
degeneration ( r 2 ) will be 0.49. It means that 49 percent of the 
variation in the demand of commodity has been explained by the 
variation in its price. 


The variation in dependent variable due to other factors are 
determined by coefficient of non-determination. The following is its 
formula : 


2 


Coefficient of non-degeneration = k 2=1-+ 


Lag and Lead 


It is found in practice that a change in the independent variable 
takes sometime to have its effect upon the dependent variable. This 
time-gap between the cause and its effect is called ‘Lag’. For 
example, the increased price of petrol will not affect the price of 
commodities immediately, but it will take sometime to increase. The 
meaning of lag is to fall behind. Hence the meaning of time lag is to 
occur of effect after the occurrence of the cause. The word lead is 
used in the opposite sense of lag, that is, occurrence of effect before 
the cause. For example, to rise in the price due to likely attack of 
enemy country. 

When once the time lag is determined the next step would be to 
push backwards the dependent variable so that correlation 
coefficient can be calculated. 


Important Formulae at a Glance 
1. Karl Pearson’s Correlation Coefficient in an Individual Series 


Sade SdeDddv 
_ Covariance (X,Y) _ Cov. (X,Y) 2 dady— 


(i) "~ Warm ee, (jf) Nee, 


dy’) | 

“| When step deviation is taken. 

(vi) Finding deviations ( x, y ) from actual means of related 
series in direct method : 


x 


i 2xy Sxy 
[$32.9 y2 ae 
a Dx" Dy OO, 


2. Karl Pearson’s Correlation Coefficient in Grouped Series : 


ae Di 
> fdx.dy - = 


12 (Sfax¥ |] 2 (Say) 
fax” = N BiGy = N 


(i) 


nw, 2. Cifae')CSfay') 
Sifax! dy'— LIA MAI} uel = 
N 


{nes Rank Difference Method : 
(i) When any two or more values in the series are not the same : 
6 D* 
~ NUN? -1) 
(ii) When two or more values in the series are the seme : 
sii 


6 50° aa i { m? my) +e” — My) 


r=1 


N(N® -1) 


1 


4. Concurrent Deviation Coefficient : 


5. Least Square Method : 
(i) For finding Y ¢ , Y =a + bx (ii) Used equations for finding the 
values of a and b 


ZY = na+b=x ; DXY = adX +bEX? 


(iii) **- 


Sy? 


afar _ oP te [sy 
(iv) 02-20) (v) r \) oy? 


THEORETICAL QUESTIONS 


Long Answer Type Questions 


By -¥.)" 
= 


1. Explain the meaning and importance of correlation. What are the different 
methods of finding correlation ? (Bilaspur 2005) 

2. Explain the means of correlation, distinguish between position correlation. 
(Rewa 2004) 


3. What do you understand by Karl Pearson’s correlation coefficient ? Discuss 
briefly its merits and limitations. (Indore 2006) 


4. Write short notes on the following : (Jabalpur 2005) 
(i) Correlation Coefficient, (ii) Concurrent deviation method, (iii) Probable error. 


Short Answer Type Questions 

1. Distinguish between positive and negative correlation. (Rewa 2009) 

2. What is correlatin ? What is the limit of correlation coefficient ? 

3. Why does coefficient of correlation ( r ) always lie between — 1 and + 1 ? 

4. If 8-2 w-¥) =1a0.2u-x" = s58.n7-7¥=200 @gnd N = 50, then what is Karl Pearson’s 

correlation coefficient ? 
[ Ans. r = 0.949] 
5. From the data given below calculate coefficient of correlation : 


xX Y 

Number of terms 8 8 

Mean 68 69 

Sum of squares of deviations from mean 36 44 
Sum of products of deviations = 24 

L(x-*)(y-7) 2A 


(Sez x1 yoy 86x44 


=06 


Ans r= 


6. Does correlation always indicates cause and effect relationship between two 
variables ? 
7. If the correlation coefficient between two variables X and Y is 0.32, 


covariance is 8 and the variance of X is 25, then find the variance of Y . 


[ Ans. vie 25] 
8. Find Karl Pearson’s correlation coefficient if N = 50, || X = 75, Y = 80, 
x 2 = 130, 0 y * = 140 and 
XY = 120. 
Ans. r= 0] 


9. The covariance between X and Y is + 23,095 and their variances are 525 and 


12,59,391 respectively. Calculate coefficient of correlation. 


[ Ans. 0.898 ~ 0.9] 

10. If N = 16, RE. = 0.125, show that correlation is not significant (0.6745 = 2/3). 

[ Ans. r= 0.5 = 4P.E.] 

11. From the universe a sample of 50 husbands and wives was taken. The 
correlation coefficient between their ages was 0.85. Determine with in what 
limits it holds true for the universe. 

[ Ans. 0.824 < 


From the following data compute the coefficient of correlation between x 


and y series : 
No. of items n = 10 
Arithmetic Mean *=57 , ¥="7 
Sum of squares of deviations from mean ¢ »*-**=182-7* 
Sum of the products of deviations of x and y series from their respective 
arithmetic mean = 90 
90 90 


Ans, -=————— = — 
180x180 18 


se] 
a4 


=0.5 


13. Write the formula showing the relation of standard error and probable error of 
correlation coefficient. |" ®-35®) 

14. Write a formula of Karl Pearson’s coefficient of correlation in grouped 
(continuous) series. 
py (x -Z)(y—9) | 

f2hlx- zB, (IP 


aoe 


15. Explain the signficance of correlation. (Bhopal 2008) 

16. Explain the linear correlation between two variables. (Vikram 2008) 
17. What is meant by correlation ? (Vikram 2009) 

18. Find ‘r’ from the following data : 

X = 225, || Y=189, n =10, |) ( X —22) ~ = 85 

(4-19) * =25 and ( X —22)( Y — 18) = 43 

[ Ans. r = 0.97) 
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19. Calculate coefficient of correlation from the following results : 
=10, || X = 100, || Y = 150 

(xX —10) * =180, 0) ( Y-15) 7 =215 

( X —10)( Y —15)=60[ Ans. r = 0.305] 

20. Given: r = 0.64, P.E. = 0.1312, calculate the value of N .[ Ans. N = 9] 
21. Given: N = 10, P.E. = 0.084, find the value of r and interpret it. 


[ Ans. r= +0.78,{ - r>6P.E.}] 


2 


22. Calculate coefficient or correlation between x and y series from the following 


data : 
xy 
No. of pairs of items 15 15 
Arithmetic Mean 25 18 
Sum of squares of deviations from arithmetic means 136 138 


Summation of product of deviations from respective means of x and y series = 


122 
(Vikram 2008) 


[ Ans. r= 0.89] 


OBJECTIVE QUESTIONS 
State whether following statements are true or false 
. There are difinite limits of the value of r . 
. The probable error is 2/3 of the standard error approximately. 
. Correlation means comparison between two variables. 


. Coefficient of correlation is the measure of association between two attributes. 


a Ff WO DN = 


. Correlation always reveals the cause and effect relationship between the two 
variables. 

6. When two variables are independent, the correlation coefficient between them 
will be zero but the converse will not always be true. 


[Ans. 1. True, 2. True, 3. False, 4. False, 5. False, 6. True.] 
Choose the correct answers 


1. The limits of Karl Pearson’s correlation coefficient are : 
(a) + 1 (b) + 2 (c) + 3 (d) none of these 


2. If the correlation coefficient between x and y is 0.58, then the correlation 
coefficient between 
=-2x+3and v=y- 3will be: 

(a) 0.58 (b) — 0.58 (c) 0.29 (d) 1.16 

3. The correlation between price and demand of a commodity is : 

(a) positive (b) negative (c) zero (d) none of these 


4. In acorrelation study the results were: || xy = 40,N=100, || x ~ =80, y 


2-20 
The correlation coefficient is : 
(a) 1.0 (b) — 1.0 (c) zero (d) none of these 


Not wai OP eat 
ote :7r= = i 
Fex2xy? {80x20 


5. The correlation between two variable will be high degree if : (Bhopal 2007) 
(a) r is more than 0.5 but less than 0.75 (b) r is less than 0.5 


(c) r is more than 0.75 but less than 1 (d) r is equal to 1 


6. pe standard error of correlation coefficient is : 
( 


a) = (b) * (c) * (d) = 
7. Karl Pearson’s correlation coefficient is always : 
(a) more than 1 (b) less than — 1 (c) between — 1 and + 1 (d) more than 1 


8. Correlation will be negative if with an increase in the value of X : 

(a) there is also an increase in the value of Y (b) the value of Y decrease 
(c) the value of Y remains unchanged (d) none of these 

9. If , ae X* and y= ¥-¥ and the number of pairs is n , then: 

(a) © fey 2sy2 


10. The formula for the rank correlation coefficient is : 


= n&xy ne Zxy ct, xy 


. y> Ly? 2552 (c) nydx" 2Sy? (d) ‘ © nEx2Zy2 


6zd? 63d” 63d” 1-6Sd? 


(a) “NN (b) “NBD (Cc) TOD (d) NOH 
[ Ans. 1. (a), 2. (b), 3. (b), 4. (a), 5. (d), 6. (c), 7. (c), 8. (b), 9. (a), 10. (b).] 
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REGRESSION ANALYSIS 


— °* Meaning and Uses of Regression. 

— ° Difference between Correlation and Regression. 
— ° Linear Regression. 

— * Regression Equations. 

— °* Calculation of Coefficient of Regression. 


MEANING OF REGRESSION 


Regression is a statistical measure which explains the average 
relationship between two or more variables. With its help the 
probable value of one variable can be estimated (or predicted) from 
the known value of another related variable. The direction and the 
degree of relationship existing between two or more than two 
variables are known by correlation technique. In simple words 
correlation coefficient tells us whether the relationship between two 
variables is positive or negative, this relationship is of high degree or 
moderate degree or low degree. It does not tell us what the probable 
value of one variable will be for a given value of other variable. This 
work is done by regression. With the help of regression technique, 
the probable value of one variable can be estimated on the basis of 
value of other variable. For example, correlation coefficient tells us in 
which direction the relationship between price and demand is and 
how much its degree is but does not tell us how much demand of 
commodity will be for a given price of commodity. It can be 
determined by regression technique how much demand of a 
commodity will be at a definite price. Similarly correlation coefficient 
will tell us in which direction the relationship between advertisement 
and sale of a commodity is and how much its degree is but will not 
tell what amount of sale of commodity will be for a definite 
advertising expenditure. 

This information is possible only through regression. Hence 
regression is an important statistical technique which tells about the 


average relationship between two or more variable and by it the 
probable value of one variable on the basis of given value of another 
variable can be estimated. 


ORIGIN OF REGRESSION 


The dictionary meaning of regression is returning back or going 
back. Sir Francis Galton used this word for the first time in 1877 in 
his research article. ‘Regression towards mediocrity in hereditary 
stature’ to study the relationship between the heights of fathers and 
their sons. He studied the heights of 1000 fathers and sons and 
arrived at some very interesting conclusions, which are given below : 

(1) Tall fathers have tall sons while short fathers have short sons. 

(2) The average height of sons of tall fathers is less than the 
average height of their 
fathers. 

(3) The average height of the sons of short father is more than the 
average height of their fathers. 


(4) Galton found that the deviations in the average height of the 
sons from the average height of the human race was less than the 
deviations in the average height of the fathers from the average 
height of the human race. When the average height of fathers were 
more or less than the average height of human race, the average 
height of sons tended to go back or regress towards the average 
height of human race. Galton called this tendency of average height 
of sons to regress towards the average height of human race as 
‘Regression towards mediocrity’. Galton found in his analysis that 
there is a high degree of positive coefficient of correlation between 
the heights of fathers and sons. Suppose this correlation coefficient ( 
r ) is 0.8. It means that if the average height of fathers is more than 
the average height of human race by 1 inch, the average height of 
their sons would not increase more than the average height of 
human race by 0.8 inches, that is, the height of sons increases less 


and tends to regress towards the average height of human race. 
According to Galton the tendency of the average height of sons to 
regress towards the average height of human race is regression. 
Thus, Galton called the line describing the tendency to regress as 
‘Regression Line’. This line studies the average relationship between 
two series and throws light on their covariance. In modern time the 
regression is used in all those fields where the tendency to regress 
towards the general mean is found in two or more than two related 
variables. Modern writers prefer to use estimating line instead of 
regression line as the nature of estimating line is more clear. 


DEFINITIONS OF REGRESSION 
Following are the main definitions of regression : 
According to M.M. Blair, “ Regression is the measure of the 
average relationship between two or more variables in terms of the 
original units of the data .” 
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According to Morris Hamburg “ The term ‘regression analysis’ 
refers to the methods by which estimates are made of the values of 
a variable from a knowledge of the values of one or more other 
variables and to the measurement of the errors involved in this 
estimation process .” 

According to Ya-Lun Chau, “ Statistical techniques which attempts 
to establish the nature of the relationship between variables, that Is, 
to study the functional relationship between the variables and 
thereby provide a mechanism for prediction or forecasting. ” 

According to Taro Yamane, “ One of the most frequently used 
techniques in economics and business research, to find a relation 
between two or more variables that are related casually, is 
regression analysis .” 


According to Wallis and Roberts, “ /t is often more important to 
findout what the relation actually is, in order to estimate or predict 
one variable (the dependent variable) and the statistical technique 
appropriate to such a case is called regression analysis .” 

From above definitions it is clear that regression is a statistical 
measure by which the value of one variable can be estimated on the 
basis of known values of other variable. The variable which is used 
to estimate or predict the other variable is called independent 
variable or explanatory variable and the variable which is predicted 
or estimated is called dependent variable or explained variable. We 
denote independent variable by X and dependent variable by Y. This 
evaluation or estimate is based upon the average relationship 
between two variables determined by regression analysis and the 
equation which is used to estimate is called regression equation or 
explaining equation. This equation may be linear or in other forms. 


USES OF REGRESSION 


In the field of various economic and social sciences regression 
analysis is extremely useful in analytical study of the problems and 
their solution. We know that there is an important place of 
forecasting in economic field and regression presents the reliable 
estimate. For example, if the relationship between price and supply 
of a commodity is established then with the help of regression it can 
be determined how much effect of a definite change in price of a 
commodity on supply will be or what the price of commodity will be 
for a definite amount of supply. The regression analysis is also 
helpful to study the problem in other sciences where the cause and 
effect relation exists between two variables. 


DIFFERENCE BETWEEN CORRELATION AND 
REGRESSION 


Though two statistical techniques correlation and regression are 
used to measure the direction and degree of relationship between 


two or more than two variables but both methods are different. 
Following are the main differences between correlation and 
regression : 


(1) Correlation coefficient measures the degree of covariability 
between two variables and Y on the other hand the purpose of 
regression analysis is to study the nature of relationship between 
these variables. So the value of dependent variable is estimated on 


the basis of independent variable by regression analysis. 

(2) Correlation analysis is a statistical technique to measure the 
degree of relationship between two variables. It does not tell the 
cause and effect relation between these two while under regression 
technique the value of dependent variable may be determined by the 
value of independent variable. Hence it tells the relationship between 
cause and effect in two variables. Here independent variable is 
cause and dependent variable is its effect. 


(3) In correlation analysis r xy Measures the direction and degree 
of linear relationship between two variables X and Y. Hence r yy and 
r yx both are symmetric /.e., F yy = F yx . It is immaterial which of x 


and Y is dependent variable and which is independent variable. 
Contrary to it in regression analysis, the regression coefficient b xy 


and b yy both are not symmetric i.e. , b yy 1 b yy Both are not 


symmetric /.e., D yy |] b yy . Hence it can be clearly indicated which 


is dependent variable and which is independent variable. . 

(4) Correlation may be nonsense but regression is never 
nonsense. 

(5) Regression coefficients and correlation coefficient have the 
same sign. 


LINEAR REGRESSION 
Regression lines are the lines of best fit to describe the average 
relationship between series of two variables. The best probable 
value of one variable can be estimated on the basis of a given value 


of other variable through these lines. Regression lines are drawn by 
two methods : 

(1) Graphical method 

(2) Algebraical least square method. 


When regression lines are drawn by graphical method, scatter 
diagram is constructed for it. Under this method, each pair of X and 
Y variables is plotted on the graph in the form of a point. Generally, 
the variable X is shown on horizontal scale and the variable Y is 
shown on the vertical scale. When all paired values of X and Y 
variables are plotted in the form of points then two regression lines 
are drawn through these points to estimate the values of X and Y. 
These regression lines are drawn in such a way that the sum of 
deviations on one side of line is equal to the sum of deviations on its 
other side. That is why it is called the line of best fit. The regression 
line which is used to know the probable value of Y for a value of X is 
called the Regression line of Y on X. Similarly the regression line 
which is used to know the probable value of X for a value of Y is 
called the Regression line of X on Y. These two regression lines cut 
each other at the point of means of the two series. If perpendiculars 
are drawn on X and Y axis from this point of intersection then the 
arithmetic means of X and Y variables would be obtained. If the 
correlation between two variables X and Y is perfect /.e., correlation 
coefficient is either + 1 or — 1 there will be only one regression line. 
The reason behind it is that the variations in the two variables 
increase or decrease by a constant figure. When the lack of 
correlation is found between two variables, these two regression line 
cut each other at right angle of 90°. These facts are clear in figures. 
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Fig. 

Algebraical least square method is-the best method to draw these 
regression lines. Under this method the sum of squares of the 
deviations between the given values of one variable and its 
estimated values obtained by the line of best fit is minimised. This 
line of best fit is obtained by the equation of straight line, Y = a + bx. 
In least square method, this line is drawn with the help of two normal 
equations : 

(1)2 Y=Na+b2zx 

(2)2 XY=arX+bzr Xx? 

The values a and 6 are constant. First constant a is intercept /.e., 
the point where regression line crosses the ordinate. In simple words 
a is the distance between the ‘origin and the point where regression 
line crosses the ordinate. If a is positive then regression line will be 
above the origin and if a is negative then regression line will cross 
the ordinate below the origin. 

Constant b in the equation is the slope of the regression line. It is 
also called regression coefficient. It gives us a measure of the 
change in dependent variable for a unit change in independent 
variable. If b is positive then the regression line will be a line rising 


from left to right and if b is negative then regression line will be a line 
declining from left to right. 

The actual values of the variables X and Y are used to find the 
regression lines from the above method. The deviations taken from 
mean can be used in the place of actual values for the simplicity in 
the calculation. In such situation, the equation Y = a + bX can be 
obtained in the following way : 


YoY = 6:00-.x) 


Calcualtion of °: 


. Y-Y ZH by (x-X) 
=> (X-XY-Y) He by (X- Xx) 
=> K- XV -¥) HE by XX - XP 
DX - OY -Y) EY -Y¥WX- XY) 
Six x) be gee Og 


Similarly the equation X = a + bY can be written in the following 
way : 


Ree HS be SLY) 


dxy 


where, b xy zy" 


But if the means of X and Y are in decimal then calculation 
becomes more difficult. To avoid this problem we can use deviations 
taken from assumed mean. Thus, 


(Xdx iXdy) 


Xdxdy - 


2 
re Selx}2 
Ydx* - Zale 


b yy = 
and b yy = 


where, dx = X—A 7; here A , assumed mean of X 


Ldxdy — (Xdx\Xdy/N 


Xdy? — (Xdy PIN 


dy=Y-A y here A y assumed mean of Y 


Note : Why do we need two regression line ? There are two 
variables X and Y in simple regression analysis. One line presents 
the regression of Y on X. It is determined by assuming 
X as independent variable and Y as dependent variable. It can be 
used to find Y for given value of X. 

Similarly second line presents the regression of X on Y. Here we 
assume Y as independent variable and X as dependent variable. It 
can be used to find X for given value of Y. Thus, we need two 
regression lines. Along with it we can also say that the regression 
line Y on X minimizes the sum of square of vertical deviations while 
regression line X on Y minimizes the sum of square of horizontal 
deviations. 


REGRESSION EQUATIONS 


Regression equations present the regression lines. Since there 
are two regression lines hence there will also be two regression 
equations. 


(1) Regression equation of X on Y: It is used to estimate the 
values of X for given values of Y . It is used to determine the best 
possible average value of X (dependent variable) for the given value 
of Y (independent variable) and regression line X on Y is obtained 
by plotting the values of equations on the graph paper. Hence the 
equation will be X =a + bY. 

(2) Regression equation of Y on X : Regression equation Y on 
X tells the changes in the values of Y for given changes in X and it is 
used to find the estimate of values of Y for given values of X . So the 
equation willbe Y =a + bx. 

Calculation of constant a and b in both equations : First 
constant a is the intercept i.e. , the point where regression line 
touches the ordinate ( Y -axis). When a is positive, regression line 


touches the Y- axis above the origin ‘O ’ and when a is negative, 
regression line touches the ordinate below the origin ‘ O ’. If a is 
zero, regression line starts from the origin ‘ O ’. The algebraic 
measure of intercept : 

()a=*-—wv inxX=at bY 

(ii)a=»—b* inY=a+ bx 

The second constant b represents the slope of regression line. It 
is known as regression coefficient. It determines the change in Y for 
a unit change in X . If b is positive then the regression line will rise 
from left to right and if b is negative regression line will decline from 
left to right. Algebraically the value of b can be presented in the 
following way in the form of correlation coefficient, standard deviation 
and averages : 

(i) "°-"> in first equation. (ii) °°" in second equation. 
Here ». and » are the standard deviations of x and y series and r is 


the correlation coefficient between two series. Thus, the regression 
lines will be : 


(i) Regression equation of X on Y x =a+t bY 


xX = (x-by)+by 


(x — x) = by -— by 


(ii) Regression equation of Y on X Y= a+ bx 


Y =(Y —-bX)+ bX 


(Y —-Y)=bx — dX 


(y-¥)=r—(x-X) 


1 


REGRESSION COEFFICIENT 


At the time of regression analysis of two related series, their two 
regression coefficients are also calculated. Regression coefficient is 
a ratio which tells what the rate of average change will be in the 
values of dependent variable for a unit change of independent 
variable in the series. Actually, regression coefficient is the slope of 
regression line. Like regression lines, regression coefficients are also 
of two types : 


Regression coefficient of X on Y : This coefficient is the 
measure of the slope of the regression line X on Y which tells how 
much change in X will be for a unit change in Y. The symbol b xy is 


used for it. Following formula is used to measure it : 


: o, d(X-X)(¥-¥) 
el a 2 
Oy Y(¥-¥) 


On the basis of this coefficient, the regression line X on Y is 
written as : 
X-X=b,(¥-Y) Here, xy “a 


Formula for regression coefficient of Y on X 
_ oy 2(X-A)¥ -¥) 
O x > (Xx = xy 


ae 


On the basis of this regression coefficient, the regression line Y 
on X is written as: 


(¥-¥)=6,,(X-X) Here, sao 


Determination of Coefficient of Correlation by 
Regression Coefficients 

Correlation coefficient can be determined on the basis of 
regression coefficients of the series of X and Y variate. Actually, 
correlation coefficient is the geometric mean of the both regression 
coefficients. In simple words the correlation coefficient is the square 
root of the product regression coefficients. 


Ga Oy 
r—xr— 
o ¥ Oy 


Example : If two regression coefficient are —0.9 and —0.5, then 
find the value of correlation coefficient. (Bhopal 2004) 
Solution : Let b yy =—0.9 and b yy =—0.5 


When °=-et. =-ve9xeos 


» B yy and b yy both are negative 


. The value of r will be negative. 
STANDARD ERROR OF THE ESTIMATE 


With the help of regression equations the most likely value of 
dependent variable is estimated for a given value of an independent 
variable. In other words, the best fitted value of dependent variable Y 
is estimated for independent variable X and the estimated value of 
dependent variable X is calculated for independent variable Y. But 
these estimated values are most likely values, not the actual values. 
For example, if the demand of a commodity is estimated 150 units 
for its given price of ~ 5 per unit by regression equation then this 
estimate is the amount of probable demand. It is possible that actual 
demand may be more or less than 150 units. In such situation to test 


the validity of estimates is very essential. Standard error is 
calculated to test this validity. If the standard error is less, it is 


considered that the estimated value is very close to original one. 

Square root of the average of squares of deviations of actual 
values trom the estimated values through the regression equations is 
called standard error of estimate. Its calculation is analogous to 
standard deviation. The difference is only that the deviations of 
actual values are determined from their arithmetic mean in the 
calculation of standard deviation while the deviations between actual 
values and their estimated values of dependent variable are 
determined in the calculation of standard error. The following 
formulae are used for the calculation of standard errors : 


Calculation of standard error of X on Y Calculation of standard 


Here, (1) S yy and S yy are respective standard error of estimated 
values through regression equations of X on Y and of Y on X. 

(2) X and Y are the actual values of X and Y series respectively. 

(3) Xe and Ye are respctive estimated values through regression 
equations for X and Y series. 

(4) N is number of items. 
Essential Work for the Calculation of Standard 
Errors 

(1) First of all probable values of X variable on the basis of values 
of Y and of Y variable on the basis of X values are calculated 
through regression equations for the calculation of standard errors. 

(2) The deviations of actual values from their estimated values are 
determined for X variable. Similarly, the deviations between actual 
values and estimated values of Y variable are determined. 


(3) Squaring these deviations, their total is determined and this 
total is divided by the number of items. 

(4) Square root of obtained quotient is determined. This square 
root is the standard error. 


Note : If the number sample units is small then 01 ( Y — Ye ) 2 and 


(xX - Xe ) 2 should be divided by N — 2 instead of N. This work 
comes under the rules of degree of freedom. 


( Y— Ye) 2 and | ( X — Xe ) 2 are called unexplained variation 
in Y and X variables respectively. So the standard error is also 
called average measure of these unexplained variations. Hence 


standard errors are also calculated as follows : 
Unexplaind variance in F f 


Sy=t» and Sx = 


Unexplaind variance in \ 
oe 


(1) Alternative method : The calculation of standard errors 
through above method is a very painful work because estimated 
values of both variables are to be determined in this method. So the 
following is a simple method to measure the standard error : 


SY? -ad¥ -hEN) 
(a) Sy= VO 


x (b) Sx= os 
We can calculate standard errors directly by using least square 
method. 


If estimated values of X and Y series are not determined but the 


values of f, [| y and (| y are known to us then we can use second 
alternative method to determine standard error. 
(2) Alternative method : 
X on Y Y on X Calculation of correlation coefficient under 


principle of least square method 
So OK eS a 
Standard error of estimates is directly related to regression line in 


the same way as standard deviation is related to arithmetic mean. If 


the variation is normal, 68.27% of all the points of actual values 
would lie between + S y, on two sides of regression line of Y on X , 
95.45%. Points would lie between + 2 S yy and 99.73% points would 
lie between + 3 S yx on the two sides of regression line Y on X. It 
means that if we add 3 S.E. to the best estimate determined on the 
basis of regression analysis or subtract then we would get upper and 
lower limits the chance of lying of their actual values within these 
limits is 99.73%. If the actual values are less scattered on the two 
sides of the regression line then standard error will be also less. 


IMPORTANT FORMULAE 


Regression Equations 


1.Givenr, Uy, y in the question, 


xon y emer 9) 


yon x yar Me—3) 
2. Given b xy and b yx in the question, 


xX on y ; x —-x=b,, (y-y) 
yY on x : ¥ — Y= bye (x —X) 


3. When least square method is to be used in the question, 
then 


xony:x=at by 


reanartsy (1) 
yonx:y=a+t bx 


Regression Coefficients 


1. Givenr, 0 y, y? in the question, 


x on y : bay = 


Oy 
Oy =F 
yx ; 


yonx: 

2. When deviations are taken from the real mean then 
_ 2dxdy 

xony: *o 2, 
Xdxdy 

yon x: 7s 


3. When deviations are taken from assumed mean (not from 
actual mean) then 


ice N-Sdxdy — (Zdx)(Zdv) 
x on y: 7 Nd? y (ead? 


NXdxady —(Zdx)(Tdy) 


by 
YON Xi * NaaPx~ (za)? 


Correlation coefficient »-*- 
Standard error of estimate : 


—— 
xony:Ss xy = {ow OF = ow? 
Illustration 1. 
You are given the following results of height and weights of 1,000 men: 
Weight (lbs) Height (inch) 


Mean 150 68 
Standard Deviation 20 2.5 


Coefficient of correlation = 0.6 

Estimate from the above data : 

(a) the weight of a man, who is 5 feet tall. 

(b) the height of a man whose weight is 200 Ibs. 


Solution : 
Let, weight be represented by X and height be represented by Y 
Then, *-» 0 »y =20 r=0.6 


=2.5 


Y =68 y 
(i) Regression of X on Y: 


(X-X)=r 


0 (y out ) 
Pz: * ; 
Or ie =06xs 0-08 
or ( X — 150) = 4.8 ( Y — 68) 
or X —150 = 4.8 Y — 326.4 
or X =4.8 Y — 326.4 + 150 
X=4.8 Y -176.4 
Value of X when Y is 5' or 60" 
X = (4.8 x 60) -176.4 = 288 —176.4 = 111.6 
(ii) Regression of Y on X: 
(v-Y)=r2¥(x-¥) or (¥ -68)=0.6x "(xX - 150) 
or ( Y — 68) = 0.075 ( X —150) 
or ( Y -68) = 0.075 X —11.25 
or Y = 0.075 X —11.25 + 68 
or Y= 0.075 X + 56.75 
Value of Y when X is 200 lbs. 
or Y = (0.075 x 200) + 56.75 


or Y= 15 + 56.75 = 71.75" 

Thus, the estimated weight of a person whose height is 5 ft. is 
111.6 Ibs and the estimated height of a person whose weight is 200 
Ibs is 71.75". 

Illustration 2. 


Given the following data, calculate the expected value of Y when the value of 
Xis12: 

xX Y 

Arithmetic Mean 8 15 


Standard Deviation 4 3 

Coefficient of Correlation of both X and Y = 0.99 
Solution : 

Regression equation Y on X is to be solved on the basis of given 
informations : 

Regression of Yon X : 


re ie g as 
Y-¥ =r—"(x_X) 
O.. 


Given values of put on, ¥-=°ss02-» 
or Y —15 = 0.7425 (4) 
or Y -15 = 2.97 
Y=2.97 +15 
Y = 17.97 
Thus probable value of Y for given value of X = 12 will be 17.97 


Illustration 3. 
Given : 
X-Series Y-Series 
Mean 18 100 
Standard Deviation 14 20 


Coefficient of correlation between X and Y series = 0.8 


Find the most probable value of Y if X is 70, and most probable value of X if 
Y is 90. 


Solution : 
Given 1 ¥=18 » ¥=100, o,=1, oy=20 f= +.8 
Determine the probable value of Y when X = 70. 
YonxX: (y-Y)=r22(x-X) 


20 
(¥ — 100) =0.8—(70-18) 
or u 


or ( Y —100) = 1.143 x 52 
or Y = 59.44 + 100 = 159.44 
Determine the probable value of X when Y = 90 
Xon Yo tee 
or ( X —18) = 0.56 (—10) 
or X = 18-—5.6 
or X= 12.4 
Note : When both means * and »” both standard deviations o , 
and o y and the sum of the product of deviations from mean ( 2 


dxdy ) are given then first of all correlation coefficient is determined 
from the following formula. Rest will be solved like previous question. 
> thy 


WOO, 
Illustration 4. 

The following data based on 450 students, are given for marks in Statistics and 
Economics at a certain examination : 

Statistics Economics 

Mean marks 40 48 

S. D. of marks 12 16 

Sum of the products of the deviations of marks from their respective means = 
42075. 

Give the equations to the two lines of regression and estimate the average 
marks in Economics of the candidate who obtained 50 marks in Statistics. 


Solution : 
Let student of statistics X and student of economics Y . 
Given " X=40, Y=48, 0,=12, %y=16, Sadxdy = 42,075, N = 450 


= 2dxdy ___ 42075 
Noyoy 450x12x 16 


=+0.49 


Regression equation, 
Xony: (X-X)=r22(¥-¥) 


‘ 12 ; 
(X —40)=0.49 x—1¥ -48) 
7 : 16 


or ( X — 40) = 0.3675 ( Y — 48) 
or ( X — 40) = 0.3675 Y -17.64 
or X = 0.3675 Y -17.64 + 40 
or X = 0.3675 Y + 22.36 

Yon x: 


: _ : = 
(Y-Y)=r—-(x-X) 


‘ 16. ‘ 
(¥ —48)=0.49x—(X - 40) 
. ; 12° 


or ( Y — 48) = 0.653 ( X —40) 
or ( Y — 48) = 0.653 X -—26.12 
or Y =0.653 X — 26.12 + 48 
or Y= 0.653 X + 21.88 
Value of Y , when X = 50 
Y = 0.653 X + 21.88 
Y = (0.653 x 50) + 21.88 = 32.65 + 21.88 = 54.53 
Illustration 5. 
Following are the data related to the marks of students in a particular 


Examination of 
Management and Statistics : 


Management Statistics 
Mean 39.5 47.5 
Standard deviation 10.8 16.8 


r=0.42. 
Find out two regression lines. Find out most likely marks in Statistics of a 
student who has secured 34 marks in Management. 


Solution : 

Following data are given by representing the marks 
management by X and marks in the statistics by Y: 

r=0.42 

Regression line X on Y 


X-X=b,,(Y-Y) 


by =P 
where “ 
0.42(10.8) = 4.536 ~ 0.97 
16.8 16.8 


Substituting of all value, 

X — 39.5 = 0.27 ( Y — 47.5) 

= 0.27 Y-— 12.825 

.X=0.27 Y—- 12.825 + 39.5 
or X = 0.27 Y + 26.675 

Regression lines Y on X 

Y-¥ =b,,(X-X) 


Gy _ 0.42(16.8) 


where * "2." 108 
/.056 =~ 065 
10.8 

Substituting are all values, 
Y — 47.5 = 0.65 ( X — 39.5) = 0.65 X — 25.675 
. Y=0.65 X — 25.675 + 47.5 

or Y =0.65 X + 21.825 


The marks of a student in the management is 34 i.e. , given X = 
34. The corresponding possible value of Y will be obtained from the 
regression line Y on X. 

Y =0.65 X + 21.825 

= 0.65 (34) + 21.825 

= 22.1 + 21.825 = 43.925 ~- 44 
Illustration 6. 

Following measures are given : 

xX Y 

Mean 36 85 

Standard deviation 11 8 

Correlation coefficient 0.66 

Find out both regression equations. 

Solution : 


Regression equation of X on Y 


Ly =) 


(Xx Xj=r 4 


0. 
O.. 


Put on the given values, **=°"s°-" 
X — 36 = 0.9075 ( Y — 85) 

X — 36 = 0.9075 Y — 77.1315 

X = 0.9075 Y — 77.1375 + 36 

X = 0.9075 Y — 41.1375 

X = — 41.1375 + 0.9075 Y 


Regression equation of Y on X 


U,, ” 
(Y-Y)=r—(X-X) 


J .. 


Put on the given values, ° "1" 


Y — 85 = 0.48 ( X — 36) 
Y — 85 = 0.48 — 17.28 

Y = 48 X —17.28 + 85 
Y = 48 X + 67.72 

Y = 67.72 + 48 X 


Illustration 7. 
Following measures are given : 


Variance of X = 25, Regression equation of X on Y:5 X — Y= 
22; Regression equation of Y on X : 64 X —45 Y = 24: 
Find out the following measures from these measures : 
(a) Mean of X and Y 
(b) Correlation coefficient between X and Y 
(c) Standard deviation of Y 
Solution : 
(a) Calculate the regression equation from <« and ¢ : 
Regression equation of X on Y 
5X — Y= 22....(i) 
Regression equation of Y on X 
64 X -—45 Y= 24 ....(ii) 


For solving these two equations multiply equation (i) by 45 and 
subtract equation (ii) from it : 


225 X —45 Y = 990 
64 X —45 Y = 24 


—_+— 
161 X = 966 
.X=m=6 


Values of X put on the equation (i), 
5(6) -— Y = 22 


30- Y=22 

. Y=30-22=8 

Since both regression line intersect each other on ‘*) therefore, « 
=6and + =8. 

(b) For finding the coefficient of correlation b yx and b xy between 
Xand Y. 

- Regression equation of X on Y 


5 X-— Y=22 
~5X=Y+22 
So, '=5 


Regression equation from Y on X, 
64 X -—45 Y= 24 

--45 Y=-64X +24 

Both side are calculate in —1, 

45 Y= 64 X - 24 


64 2 
y = Sy 24 
15 45 


So b yy = 64/45 
; r= fxd = == 053 


(c) Calculate of standard deviation of Y: 


and given : Variance of X -()-* 


0, =V25 =5 


Put the value of 1 y or b yy, , 


64 


A5 


Oy 


a: 
15\ 5 


6415, 
Oy ==- xX x5 
45 8 


8x5 40 
=—— x — =18.88 
3° 3 


Illustration 8. 


For 50 students of a class the regression equation of marks in Statistics ( X ) 
on the marks in Accountancy ( Y )is 3 Y —5 X + 180 = 0. The mean marks of 
Accountancy ( Y ) is 44 and the variance of marks in Statistics is 9/16 p of the 
variance of marks in Accountancy. Find the mean marks in Statistics and the 
coefficient of correlation between marks in Statistics and Accountancy. 

Solution : 

In above question, the marks in Statistics is assumed as X - 
variable and marks in 
Accountancy as Y -variable. Following values are given : 

N=50, Y = 44 

Ratio of and * is i thatis 


2 
OX 


ean 
Coefficient of « : Regression line X on Y 
3 Y-5X+180=0 
- This line passes through ‘) 
» 8Y-5X+180=0 
Now put, r-4 
3x44-5X+180=0 Of 132-5X+180=0 
OF s12-5x=a12 
OF sx-31 


- 312 

XA =— = 62.4 
5 

Coefficient of correlation ( r ) 
- Regression line X on Y 
3Y-5X+180=0 05°" 
».5X=3 Y+ 180 

or xa tye 

or X = 1.6 Y + 36 


or 0.6=rx0.75 
0.6 

r= —— 
0.75 


Illustration 9. 


= 0.8 


Given: X +2 Y—5=0, 
2X+3 Y-8=0, 


t) : 
oy, =12 


Find : (i) = , (ii) x , (ili) * , Gv) r (Jabalpur 2005) 
Solution : 

In such type of equations, first it is essential to determine which 
equation is regression line X on Y and which is Y on X . For this we 
assume any one of the equation as X on Y and change that in its 
form. In above example, let first equation be regression line X on Y. 

X=-2Y+5 


Now if first equation is X on Y then the second will be regression 
line Y on X . From the second equation, 


3Y=-2X+8 
r D2 «ur tel 
= +5 +3 


If our guess is correct then the value of b yy x b yy should be less 


than 1 because the value of r is always less than 1. 
Here, b yy =—2and b yy, = § 

SO '»*%s=-)=2 which is more than 1 so the value of r will be more 
than 1 and it is never possible. 

So our assumption is not correct and correct equation will be in 
opposit way, /.e. , the first equation will be Y on X and the second 
equation will be xX on Y. Writing the correct equation in an 
appropriate form, 

Regression equation X on Y, 

2X+3 Y-8=0 

X=-Y+450 &=> 

Regression equation Y on X , 

X+2Y-5=0 

¥=-3¥+§ SO %=-4 

(i= 


=y(-3)(-2) =-/3 =-0.866 


(ii) Calculation of variance of series Y ( * ): 


; ) 
\2 
9 by. ey 
= 


r 


Substituting the values 
oy aa 


~ | 10.866 


_ 0.2512 
0.75 


9 
c= 4 


(iii) For finding the valus of x and + , original equations are to be 
solved like simultaneous equations : 

X+2 Y-5=0....(1) 

2X+3 Y-8=0....(2) 

Multiplying equation (1) by 2 and subtracting equation (2) from it, 

2X+4Y-10=0 

2X+3 Y-8=0 


——t 
Y-2=0 
Y=2S0% =2 


Substituting the value of Y in the equation (1), 

X+2x2-5=0 

X-1=0 

X=1S0x=1 
Illustration 10. 

In a partially destroyed laboratory record of an analysis of correlation data, the 
following results only are eligible : (Indore M.Com. 2005; Vikram 2005) 


Variance of X =9 
Regression lines : 


8 X-—10 Y +66=0 
40 xX -18 Y =214 


Find : 

(a) the mean of X and Y 

(b) the standard deviation of Y 

(C) the coefficient correlation X and Y 
Solution : 

Since we know that regression lines cut each other at ‘*”) hence 
for finding the values of * and ” we solve both the equations. This 
solution will be the values of * and» . 

8X =10 Y+66=0 

8 X-10 Y =-66....(1) 

40 X-18 Y= 214....(2) 

Multiplying equation (1) by 5 and subtracting equation (2) from it, 

40 X —50 Y =- 330 

—-40 X +18 Y=-214 

—32 Y=- 544 


—544 


-32 


Y 17 


Substituting Y = 17 in equation (1), 
8 X—10x 17 =-66 

or 8 X — 170 =-— 66 

or 8 X — 170 — 66 = 104 


Thus, the values of « and » 13 and 17 respectively. 
We can write the regression lines in the following way : 


66 
10 

or Y=+0.8 X + 6.6 
This line is the regression line Y on X. 


0.8X -—Y = 


Oy 
b,, =+0.8=r 
4 : oO; 


Now second line : 
40 X-—18 Y=214 


or = 


then r= Puc by, =J0.8x045 = /0.360=0.6 
c 


; 
by, =r 
Ox 


0, 
0.8 = 0.6 
> 3 


0.8x3 
oy 
0.6 


4 


Note : Let first equation be X on Y and second equation be Y on 
x. 
8 X-10 Y+66=0 
or8X=10 Y—66 
or x er-@ [a 
Second equation, 
40 X—18 Y =214 


60 214 


or * 718% 18 


10. 40 


then r= fbyy Dy = a qe 7 ¥278 = 1.67 


But the value of r is never more than. Hence first equation will be 
Y on X and second equation will be X on Y. 
llustration 11. 

Find out two regression lines from the following data. 


X58764 


Y34521 
Solution : 

xyx2y2xy 

5325915 

8 4 64 16 32 

7 5 49 25 35 


6 2364 12 
411614 


X=300 Y=150X7=1900 Y2 =55 0 XY =98 


ty ee 
then “ vs?-@? 


90-450 40 
5x55 -(15)" 2752095 ~ 50 080 


_ N-edxv—-(Ex) (dy) 


b ry = 9 ‘ 
yn NT Siy* sy)4 
a Lak = tad 
5 x 98 —(30)(15) 90-45 
= —s - 490 — 450 = 19 ogo 
5 x190-(30) 950 — 900 50 


The regression line X on Y: 


X-X =6,,(Y-Y) 


or X —-6=0.8( Y—-3) 
or X -6=0.8 Y-2.4 
or X-0.8 Y=6-2.4 
or X —0.8 Y=3.6 
or X=0.8 Y + 3.6 
The regression line Y on X : 


Y-Y =6,,.(X-X) 


or Y-3=0.8( X -6) 
or Y—3=0.8 X —4.8 
or Y=0.8 X - 1.8 
Illustration 12. 

Find out regression lines for the following given data of X and Y . Find the 
possible value of Y when 
X =6.2: 

X123456789 

Y 981012 11 13 14 16 15 

(Jabalpur 2006) 


Solution : 


XY x=X-X y=Y-Y xX 2 y 2 xy 


19-4-3 16912 


61311111 
71422444 
816349 16 12 
9154316912 
Total 45 108 0 0 60 60 57 


N=Q9 Fines 
Pe 
N 9 
Say 57 
b,,, = =?" = 0.95 


The regression line X on Y : *-*=%("-¥) 
~>X-5=0.95( Y-12) 
= 0.95 Y — 11.40 
or X =0.95 Y-—11.40+5 
. X=0.95 Y-6.4 
The regression line Y on X : 
Y-Y =b,,(X-X) 
Y-12=0.95( X —5) 
= 0.95 X — 4.75 
or Y=0.95 X —4.75 + 12 
. Y=0.95 X + 7.25 
When X = 6.2, then 


Y=0.95 x 6.2 + 7.25 
= 5.89 + 7.25 = 13.14 


Correlation coefficient, "=. =J95x35=0.95 
Illustration 13. 

From the following data find two regression lines X on Y and Y on X . If Y = 
30, find the value of X . 

Age of Husband 18 19 20 21 22 23 24 25 26 27 

Age of Wife 17 17 18 18 18 19 19 20 21 22 
Solution : 

Let Age of husband = X , Age of wife = Y 

X Y dx = X—22 dy = Y—18 dx 2 dy @ dxay 


1817-4-11614 
1917-3-1913 


2018-20400 
2118-10100 
221800000 
231911111 
241921412 
25 2032946 
26214316912 
27 2254 25 16 20 
Total 5 9 85 33 48 
Given : N = 10, Kaa, +72 = 2245-205 
Similarly, °-"*'x7** 07"? 
dd — (Td) (Sdy)/ N 


Regression coefficient “”  »«°-:2« 


_ d8-(5)(9)/10 48 -4.5 43 


5 2 
85-(5)2/10  85-2.5 = gp 5 ~ 0527 


_ Zdxdy — (dx) (Zdy YN 


S Sdy? — (dy)? /N 


18-(5)(9)/10_ 48-45 ggg 
33-(97/10  838-8.1 ~ 59497 


1.747 
Regression line Y on X : 


Y-Y =b,,.(X-X) 


Y — 18.9 = 0.527 ( X — 22.5) 

Y- 18.9 =0.527 X — 11.8575 

. Y=0.527 X — 11.8575 + 18.9 
or Y= 0.527 X + 7.0425 

Regression line X on Y : 


X-X=6,,(Y-Y) 


X — 22.5 = 1.747 ( Y — 18.9) 

= 1.747 Y — 33.0183 

» X= 1.747 Y — 33.0183 + 22.5 
or X= 1.747 Y — 10.5183 


Calculation of value of X when Y = 30 

- Regression line X on Y is: 

X = 1.747 Y-— 10.5183 

Substituting the value of Y , 

X = 1.747 x 30 — 10.5183 

X= 41.89 
Illustration 14. 

(A) Mathematically proved that : fr = %% 

(B) If the two regression coefficients are 2 and 0.45 respectively, what will be 
the value of coefficient of correlation ? 

(C) If the two regression coefficients are — 0.85 and — 0.89 respectively, what 
will be the value of coefficient of correlation ? 

(D) One student calculated the values of two regression coefficients are 1.12 ( 
X on Y ) and 0.9 ( Y on X ) respectively. Prove the assumption of student is 


correct or not ? 


Solution : 
(A) r= 6% 
We know that, “~"= ....(i) 
or "x ....(ii) 
Multiply of both equations, 
Gott Be.) 
Dry Dyee =| 1 im x | re | 


Bi Dyn = 
— ah Ta Oo 
2 
Byy Dyy =r 


(B) r= v&*% Given: b yy, =2, Db yy = 0.45 


xy YX 


» Given: b xy = — 0.85, b yy =— 0.89 
= GOS _ pew 
r=-— 0.87 
(D) We know that, 
xy’ b yXx = 1 
b xy = 1.12, b yx = 0.9 
2D xy ‘Db yx = 1.12 x 0.9 
= 1.008 > 1 which is not possible. 
So in the above case, the assumption of one student is not 
correct, because the value of b yy, ‘b y, is greater than one. 


IHlustration 15. 


The following informations are given to you : 


4 ( X —58)=0, o ( X —58) * = 3086 


4) ( Y-58)=0, 0 ( Y—58) * = 483 


4 (X—58)( Y—58)= 1095, N=7 


You are determine : 
(i) Both regression equations 


(ii) Coefficient of correlation between X and Y series 


Solution : 
Given: 
YX -58) = Sav = OEY - 58) = Sax = 3086 
X(Y -58)= Ydy = 0,2(Y -58) = Xd*y = 483 


=(X —68)(Y —58) = Sdxdy = 1096, N = Fé 


dx = 0 and dy = Oj.e. , deviations are taken from actual 
means. So « = 58 and ?-ss . 
Regression coefficients 
X on Y “Sey “ss = 2,27 
Yon X > 3c “ir = 0.36 
(i) Regression equations : 
Xon Y 


(Xx aad x) = b.. 64 = Y) 


( X — 58) = 2.27 ( Y — 58) 
(X —58) = 2.27 Y — 131.66 
X=2.27 Y — 131.66 + 58 
X= 2.27 Y — 73.66 

Yon x 


(Y = Y) = Dy (X- Xx) 


( Y — 58) = 0.36 ( X — 58) 
( Y — 58) = 0.36 X — 20.88 
Y = 0.36 X — 20.88 + 58 

Y = 0.36 + 37.12 

(ii) ee = Sareea 


Illustration 16. 


From the following data determine both regression equation by least squares 
method : 


X12345 
Y 25387 
Solution : 
xyx2y2xy 


12142 
25425 10 
33999 
48 16 64 32 
5 7 25 49 35 


X=150 Y=250X7=550 Y2=151 O XY=88 


Regression equation of X on Y 


x=Nat+biy 
xy=alliyt+tbluy 
15=5a+25 b....(i) 
88 =25 a+151 b....(ii) 

Multiplying equation (i) by 5 and subtracting equation (ii) from it, 
88 =25a+151b 

75=25a+125 b 


2 


13 = 26 b 
13 

b= —=0:5 
26 


Substituting the value of b in equation (i) 
15=5a+25x0.5 
15=5a+12.5 
15-12.5=5a 
2.5=5a 
a= ee 0.5 
D 


X=at by 
X=0.5+0.5 Y 
Regression line Y on X 


Yy = Nat+brx 
Man — aY¥r hy 4 
2XY = ALK + OLX 


25=5a+15 D....(i) 

88 = 15 a+55 b....(ii) 

Multiplying equation (i) by 3 and subtracting equation (ii) from it, 
88 =15a+55 b 

75=15a+45b 


13 =10b 
b= a7} 
Substituting the value of b in equation (i) , 


25=5a+15%*1.3 


5.5=5a 

qzepth 
Y=at bx 

Y =1.1+1.3X 


Illustration 17. 

Following table gives the ages of husbands and wives for 50 newly married 
couples. Find the two regression lines. Also estimate (a) the age of husband, when 
wife is 20, and (b) age of wife wnen husband 
is 30. 


Age of Wives Age of Husbands Total 
20-25 25-30 30-35 
16-20 9 14 — 23 


20-24 6 11 3 20 
24-28 ——7/7 


Total 15 25 10 50 
Solution : 


Here class interval for x ,/ y =9 
and class interval for y , ly = 4. 


Age of husband ( x ) 20—25 25-30 30-35 
Mid value ( M.V. ) 22.5 27.5 32.5 

dx -5 0 +5 
Age of Median dx —1 0 +1 Total fa'y fd ' 2 y fdx' dy 
Wife (y )(M.V. ) dy dy 

10 

16-20 18 -4 -1 9 14 — 23 -23 239 

90 

000 

20-24 2200611320000 

000 

1 


24-28 26 +44+1--77777 
7 


Total 15 25 1050 0 fd'y 0 fdy'2 0 fdx' dy' 
=~ 16 =30= 16 
fd'x 150100 fdx'=—5 


fd’? x 15010 0 fax' 2 =25 


wa’ 907 O fox ' dy’ 

= 16 
Regression coefficient x on y 
By. a > fax' dy' y, N-(¥ fax')(> fly’) a 


5 f ig NS 
dfdy “x N-(Zfay ) 


16x 60 —(-5)(-16) 6 800 — 80 5 
x ee —=E? 


30x50-(-16)? 4 1500-256 4 


720 5 
1.5787 1.25 =.7 


or bey gag ag RE lee 
Regression coefficient y on x 


> fdx'dy'x N -(% fdx')(S fay') ty 
Six KN -(S fae" 


— 


_ 16x 50- (5-16) 4 800-80 | 4 
25 x 50 —(-5)? “5 = 1250-25 5 


HAO ea 0.5877 X 0.8 = 0.47 


720 
or bye 7 9965" 5 


Correlation coefficient *-v= 
2 X Ly 

2 FMB = = 27.5 — 0.5 = 27 
Regression equation x on y *-F=»(-9) 
x — 27 = 0.723( y — 20.72) 

x — 27 = 0.723 y — 14.98 

x = 0.723 y — 14.98 + 27 

x = 0.723 + 12.02 

(a) Calculation for the age of husband 
x = 0.723 y + 12.02 in y = 20 put on, 
x = 0.723 x 20 + 12.02 

x = 14.46 + 12.02 

X = 26.48 year 


Di ae 

gy 
i Paty eS = 22428 = 20.72 
Regression equation y on xX »-%>%&(-» 
y — 20.72 = 0.47 ( x — 27) 
y — 20.72 = 0.47 x — 12.69 
y =0.47 x — 12.69 + 20.72 


y =0.47 x + 8.03 
(b) Calculation for the age of wife 


x=At 


y=Art 


y = 0.47 y + 8.03 in x = 30 put on, y = 0.47 x 30 + 8.03 = 14.1 + 
8.03 
y = 22.13 year 


Exercise 


1. The following results were obtained from the data pertaining to marks obtained 


by 100 students in Economics and Mathematics : 
Economics Mathematics 


Mean 55 80 


Standard Deviation 9 12 

Coefficient of correlation = 0.8. 

Find the most probable of a student in Economics who gets 70 marks in 
Mathematics. 


[Ans. 49] 

2. A study of wheat prices at Bhind and Gwalior yields the following data : 

Bhind Gwalior 

Average Price * 2.463 * 2.797 

Standard Deviation * 0.326 ° 0.207 

r=0.774 

Estimate from the above data the most likely price of wheat (a) at Bhind 
corresponding to the price of ~ 2.334 per kg at Gwalior, (b) Gwalior 
corresponding to the price of © 3.092 per kg at Bhind. 

[Ans. (a) © 1.90, (b) © 3.10 ] 

3. The following table shows the means and standard deviations of the prices of 
two shares on the Mumbai Stock Exchange : 

Shares of Mean Standard Deviation 

Ltd. Com. A * 39.5 °° 10.8 

Ltd. Com. B* 47.5 °° 16.8 

If the coefficient of correlation between the prices of two shares is 0.42, find the 


most likely price of share A , corresponding to a price of * 50. observed in the 


case of share B . 

[Ans. 40.18] 

4. Given the following values, find the expected value of X when Y is 12. 
Average of X series = 25, Average of Y series = 22, r= + 0.8. S. D. of X 
series = 4, S. D. of Y series = 5. 

[Ans. 18.6 ] 

5. The following data are relating of 500 students in Statistics ( X ) and Business 
Adiministration ( Y ) in an examination : 

Particulars Statistics (X) Business Administration (Y) 

Mean Marks 72 60 

S.D. 16 12 

Sum of products of deviation of marks from their Means = 61,400. 


You are required to find out the following : 
(A) Coefficient of Correlation, (B) Both Regression Coefficients, 


(C) Both Regression Equations, (D) Value of Y , if X = 75 


[Ans. (A) r = 0.64, (B) b yy = 0.853, b yy = 0.48, (C) X = 0.853 Y + 20.82, Y 


= 0.48 X + 25.44, (D) Y = 61.44 J 
6. Given that : 
Particulars X-Series Y-Series 


Mean 5 4 
S. D. 1.224 1.414 


yx 


Number of item is 8 and sum of the product of deviation from mean of X and Y 


series is + 6. 
Find out : 
(A) Two Regression Equations, 


(B) Estimated value of X when Y =5 


[Ans. (A) X = 0.375 Y+3.5, Y =0.5 X + 1.5, (B) 5.375 ] 
Calculation by Regression Coefficient 


7. Calculate the regression of X on Y and Y on X , and coefficient of correlation 
from the following 


data : 


X12345 

Y 981012 11 
[Ans. X =0.8 Y—5, Y =0.8 X + 7.6, Coeff. of r =0.8] 

8. Find out the regression equation from the following data : 
X 27 27 27 28 28 18 29 29 30 31 

Y 18 18 19 20 21 21 22 23 24 25 

[Ans. X = 0.56 Y + 15.58, Y = 0.26 X + 13.98, b 


= 0.56, b ,, =0.26 J 


xy 
9. From the following data, find regression equation of Y on X : 
X0121034221 
Y0213134212 


[Ans. Y = 0.67 X + 0.83] 


YX 


10. A record of maintenance cost is kept on 6 identical machines of different 
ages. Management wants to determine whether there is a functional relationship 
between machine age ( X ) and the maintenance cost ( Y ). The following data 
are obtained : 

Machine 123456 

X213213 

Y (~*~ ) 70 40 100 80 30 100 

Find the regression equation of Y on X. What should be the maintenance cost 
for a 4 year old 
machine ? 

[Ans. Y = 32.5X +5, Y = 135] 

11. Calculate from the following data, the linear regression lines of Y on X and 
of X on Y and the co-efficient of correlation : 

X8134712 

Y7043681 

[Ans. Y = 0.51 X +2.25, X =0.44 Y + 1.89, r = 0.474] 


12. Calculate the regression coeffcients for the data given below : 

X 86475 

Y98562 

[Ans. b xy = 0.4, b yx = 1.2] 

13. The following data give the daily wages and expenditure on food of 10 
families : 

Wages (~*~ )( X ) 120 90 80 150 130 140 110 95 75 105 

Expenditure (~*~ )( Y ) 40 36 40 45 40 44 45 38 40 35 

Calculate the linear regression of expenditure of food ( Y on wages X ). 

[Ans. Y = 0.079 X + 31.65] 

14. An investigation into the demand for television sets in 7 towns has resulted in 
the following data : 

Population (in 000) ( X ) 1117 14 14 17 21 25 

No. of T.V. Sets Demanded ( Y ) 15 27 27 30 34 38 46 

Fit a linear regression of Y on X and estimate the demand for T.V. sets for a 
town with a population of 30 thousands. 

[Ans. Y = 1.93 X — 1.81, Y = 56.09 or 56] 

15. The equations of the regression lines between two variables are expressed 
as 2 X —3 Y =O and 
4 Y -5 X —- 8 = 0. Find X and Y , the regression coefficients and the 
correlation coefficient between 
Xand Y. 

[Ans, *=8%=87=0% J 

16. The two regression lines obtained from certain data were Y = X + 5 and 16 
X =9 Y — 94. Find the variance of X , if the variance of Y is 16. Also find the 
covariance between X and Y . 


[Ans. Variance of X = 9, Covariance ( XY ) = 9] 


[Hint : Covariance ( XY ) =" J 

17. Find the mean values of the two random variables x and y and the 
correlation coefficient between them when the two lines of regression are given 
by: 

5x +7 y-—22=Oand6 x +2 y—20=0 

If the variance of y is 15, find the standard deviation of x . (Bhopal 1998) 

[Ans. x =3, y =1, r=—0.488, * =2.65 J 

18. Regression lines Y on X and X on Y are given below : 

(1) Y =11.64-0.5 X (2) X = 19.13 -0.87 Y 

Find xv andr. 

[Ans. *=1591y-=367 and r =—.66] (Jabalpur 2004) 


19. Following values are given related to two variables : 
Variance = 36 


12 X —15 Y +99=Oand 64 X — 27 Y = 373 
(i) Find * and y . 


(ii) Calculate standard deviation of Y . 
(iii) Calculate correlation coefficient between two variables. 


[Ans. « = 13, y =17, 0 y= 8 and dr =0.6 ] 


20. The following regression equation were calculated from a correlation table : 
Y=0.5 X +25 


or X =0.4 Y + 22 
Calculate : 

(a) ¥ and y 

(b) Ratio of their S.Ds. 

(c) r 

(d) The most likely value of X if Y =8 


cea m OF oy = ae} 
Ans. X =40 Y =45,— (7=045, x= 25.2 
Oy Y 


21. For X and Y series which are correlated the two lines of regression are : 


5 X-6 Y =-90 and 15 X —-8 Y —130=0 

Find which is the regression of Y and X and which is of X and Y . Find the 
means of the two series and the correlation coefficient. 

[Ans. *=%.r=4. r =0.67, X and Y equation is X = 0.533 Y + 8.67 and Y on X 
is Y = 0.833X + 15] 

22. Obtain the equations of the lines of regression and the standard error of the 
estimates from the following data : 

X12345 

Y68768 


[Ans. X =0.5 Y—0.5, Y=0.2 X +64, S , = 1.34, S ,, = 0.85] 


y 
23. Find the standard error of estimates from the following data : 
X13468911 14 

Y12445789 


[Ans. S , = 0.86, S ,, = 0.56] 


y 
24. The following table is related to the age of wives and husbands of 50 newly 
married couples. Find 
(1) two regression lines, (2) age of husband when the age of wive is 21 years, 


(3) age of wife when the age of husband is 25 years, (4) correlation coefficient. 
Age of Husband 


Age of wife 20-25 25-30 30-35 Total 

18-20 9 15-24 

20-22 6 10 3 19 

22-24--—77 

Total 15 25 10 N = 50 

[Ans. Let X = Age of husband, Y = Age of wife (Bhopal 2005) 


(1) X =1.42 Y —1.85, Y = 0.23 X + 14.11 (2) = = 27.97, (3) 19.86, (4) 0.57] 


THEORETICAL QUESTIONS 


Long Answer Questions 


1. Define ‘Regression’. How does it differ from correlation ? Why are there two 
regression lines ? Under what conditions, can there be only one regression line 
? 

2. Distinguish between Correlation and Regression. Also discuss the various 
methods of measuring regression. 

3. What is Regression Line ? How it can be measured ? 

4. "The regression line give only a ‘best estimate’ of the quantity in question. We 
may assess the degree of uncertainty in this estimate by calculating a quantity 


known as the Standard Error of Estimate......... " Elucidate. 
Short Answer Questions 


. Define regression. 

. Give two objectives of regression. 

. What is the difference between correlation and regression ? 
. Why are there two regression lines ? 


. Under what circumstances can there be only one regression line ? 


ao a fF WS NY = 


. Write the formula of coefficient or regression of X on Y and Y on X by short 

cut method. 

7. What is the standard error of estimate in regression ? 

8. How will you calculate the standard error of estimate in regression ? 

9. Prove that coefficient of correlation is the geometric mean of regression 
coefficients. 

10. From the given regression equations how will you decide which is for X on Y 


and which is for Y on X ? 


OBJECTIVE QUESTIONS 
State the following statements are true or false : 
1. Regression analysis tells the average relation between two variables. 
2. The regression coefficient Y on X measures 1 unit variation in Y due to the 


change in X . 


3. Regression coefficient is independent of change of origin and scale. 

4. The regression line can be drawn without the use of least square method. 

5. If standard error is zero then there is no variation about the regression line and 
correlation is perfect. 


[ Ans. 1. True ] 2. False ] 3. False ] 4. True ] 5. True] 
Fill in the blanks : 


1. The statistical technique by which we can estimate unknown value of one 
variable from the known value of other variable is said to be........... 

2. Both regression coefficients are not .............. than 1. 

3. First of all ........ used the word regression. 

4. Both regression lines are as closed to each other the value of correlation is as 


[Ans. 1. regression, 2. more, 3. Galton, 4. more] 
Choose the correct answers : 


1. In Statistics, the word ‘Regression’ was first used by : 
(a) Spearman (b) Sir Francis Galton 
(c) Karl Pearson (d) Irving Fisher 


2. When does Sir Francis Galton used the word Regression for the first time : 
(a) 1877 (b) 1977 
(c) 1677 (d) 1875 


3. The meaning of Regression is : 
(a) Forward movement (b) Return back 
(c) Go upward (d) Jump downward 


4. Given: r =0.8, *-* °’» value of b xy will be : 

(a) 64 (b) 6.4 

(c) 0.64 (d) 0.064 

5. Regression equation for Y on X and Y are6 Y=5 X +90 and15 X =8 Y 
+ 130, then value of Y will be : 

(a) 30 (b) 40 

(c) 50 (d) 60 


6. When regression lines intersects each other as rectangle, then the correlation 


between them is : 
) Positive (b) Negative 
c) Zero (d) Normal 


(a 
( 
7. If one regression coeffcient is negative, then other will be : 
( 


a) Positive (b) Negative . 
(c) Positive or Negative (d) None of these 


8. If regression equation for Y on X is6 Y —2 X =6, then value of b yx will be 


(a) © (b) = 
(c) 1 (d) 2 
9. Generally Regression lines are : 
(a) 1 (b) 2 
(c) 3 (d)4 


10. If two regression coefficient are 0.2 and 0.8, then the value of correlation 


coefficient will be : 
(a) 0.16 (b) 0.10 
(c) 0.4 (d) 0.6 


11. If two regression coefficients are — 0.4 and — 1.6, then the value of correlation 


coefficient will be : 
(a) — 0.8 (b) + 0.8 
(c) — 0.12 (d) — 0.64 


12. Both regression lines will coincide if : 

(a) r=+1(b) r=0 

(c) r =0.5 (d) r =0.1 

13. Two regression lines will be perpendicular if : 

(a) r=+1(b) r=0 

(c) r=— 1/2 (d) r=0.1 

14. The airthmetic mean of regression coefficients will be : 


(a) more than r (b) less than r 


(c) equal to r (d) None of these 

15. Regression lines cut each other : 

(a) on mean of X and Y (b) on mean of X 
(c) on mean of Y (d) None of these 


16. The regression lines Y on X minimise : 

(a) Sum of squares of horizontal deviation 

(b) Sum of squares of horizontal deviation 

(c) Sum of squares of both horizontal and vertical deviations 
(d) None of these. 


17. The regression line Y on X is drawn in such a way that : 

(a) 2x-Ye?=0 (fp) 2Er-x)=0 

(C) 20r-ve=0 (q) 20"-Ye=0 

[Ans. 1. (b), 2. (a), 3. (b), 4. (c), 5. (b), 6. (c), 7. (b), 8. (a), 9. (b), 10. (c), 
11. (a), 12. (a), 13. (b), 14. (a), 
15. (a), 16. (b), 17. (d).] 
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INDEX NUMBER 


* Meaning of Index Number 

* Characteristic of Index Number 
* Importance and Uses 

* Construction of Index Number 
* Cost of Living Index 


ae ae 


¢ Fisher Ideal Index Number 


MEANING OF INDEX NUMBER 


Index number is a special type of average which measures the 
changes in a variable or a group of related variables on the basis of 
time, place or other characteristics. For example, when we say that 
the index number of wholesale prices is 300 for the period 199495 
compared to 100 in 1980-81 in India, it means that prices of 
commodities and services have increased three times in 1994-95 
compared to 1980-81. For comparing the price level of commodities, 
assuming the prices of selected representative commodities such as 
rice, wheat, clothes, pulse, oil, rent of house etc. as 100 in the base 
year 1980-81, the prices of these commodities in 1994-95 are 
expressed in the form of percent of prices of base year. After it we 
take the average of these percentages. This average of percentages 
is known as Index Number in statistics. Index numbers give the 
knowledge of changes in the price level over two periods of time. 


DEFINITION OF INDEX NUMBER 


Learned persons have defined the index number in the following 
way : 

According to Croxton and Cowden , “ /ndex numbers are devices 
for measuring differences in the magnitude of a group of related 
variables. ” 


According to Horace Secrist , “ /ndex numbers are a series of 
numbers by which changes in the magnitude of a phenomenon are 
measured from time to time or from place to place .” 

According to Wessel and Willet , “ An index number is a special 
type of average that provides a measurement of relative change 
from time to time or from place to place .” 

According to Horace Secrist , “ /ndex numbers are a series of 
number by which changes in the magnitude of a phenomenon are 
measured from time to time or from place to place .” 


“ 


According to Croxton and Cowden , Index numbers are 
devices for measuring differences in the magnitude of a group of 
related variables .” 

According to Dr. A.L. Bowley , “ A series of index number is a 
series which reflects in its trend and fluctuation the movements of 
some quantity to which it is related ” . 

According to C.T. Clark and L.L. Schkade , “ An index number is 
a percentage relative that compares economic measures in a given 
period with the same measures at a fixed time period in the past .” 

According to Murray Spiegel , “ An index number is a Statistical 
measure designed to show changes in variable or a group of related 
variables with respect to time geographic location or other 


characteristics .” 

From above definitions, it is clear that an index number is a special 
type of average. The relative changes of related variables over 
different time periods or places are measured by it. 


CHARACTERISTICS OF INDEX NUMBERS 


On the basis of study and analysis of above definitions, the 
following characteristics are apparent in index numbers : 


(1) Numerically expressed : The changes are expressed in 
numbers by index numbers. Generally we express the changes in 
words such as price is increased, demand is decreased etc. Index 
numbers represent it in number such as consumer price index 
number is 143. It means that price has increased by 43%. 

(2) Relative measure of changes : Different types of units are 
used to measure the different types of commodities such as unit of 
the edible items wheat, rice etc. is Quintal, unit of liquid materials is 
litre and of clothes is metre. The changes in their prices can be 
measured separately with their actual prices but when we have to 
measure the combined change in their prices, no special knowledge 
could be obtained through actual price or absolute measure. The 
relative measure of the combined changes is possible through the 
index numbers. That is why general wholesale price index numbers 
are used for measuring the purchasing power of money. 

(3) Representation in the form of average : The direction of 
change is represented in the form of the averages by index numbers. 
The measure of central tendency of the date is obtained by the 
average, i/.e. , not only the direction of change is measured by it but 
it represents the measure of both direction and degree. 

(4 ) Average of percentages : Index numbers are calculated in 
percentages. So these are the average of percentages. Assuming 
100 to base year, the percentages of current values are determined 
while constructing the index number, but the sign of percent (%) is 
not used in it. 

(5) Base of comparison : Index numbers are used for measuring 
changes over a period of time or between places. Time is mostly 


used for the comparision. For it, we assume the price of a year, 
month or its fraction as a base. 


USES OF INDEX NUMBER 


(1) Helpful in studying trend : Index numbers are used to 
measure the changes in any fact over any period of time. Trend of 
export-import, condition of payment balance of country, trend of 
prices and industrial production, rate of changes in national income 
and per capita income may be determined over a fixed period of time 
with the help of index numbers. For example, on the basis of index 
numbers of import trade of last ten years in India it can be said that it 
has the continuous increasing tendency. Similarly on the basis of 
index numbers of industrial production, income, wages and prices in 
India, the conclusions can be drawn about their tendency. Important 
decisions can be taken by analysing these tendencies. Thus index 
numbers are very useful in the study of business conditions. 

(2) Helpful in framing suitable policies : Since index numbers 
are helpful in study the tendencies of various economic and business 
events, various important decisions are taken in the field of 
economic and business on the basis of these tendencies. For 
example, the government decides the dearness allowances, salary 
and wages for the employees on the basis cost of living index 
number and consumer's price index number. Similarly, the index 
numbers of demand and supply of commodity and purchasing power 
of money are used by the business officers for the plannings of their 
future. 

(3) Helpful in measuring the purchasing power of money : 
Index numbers are helpful in measuring the actual purchasing power 
of money or price. The value of money is measured on the basis of 


price index numbers of commodities and services. Increasing price 
index number indicates that the value of money is decreasing. 
Opposite to it declining price index number tells the fact that the 
value of money is increasing. Generally we see that the value of 
indian currency had remained only 0.05 paise in 1994 as compared 
to 1970-71. It means that we had to expend Rs. 2,000 in 1994 to 
obtain the same commodities and services that we obtained for Rs. 
100 in 1970-71. Someone stated, “Measuring of changes in the 
value of money could be possible only through the technique of 
Index Number.” 

(4) Helpful in deflating various values : Index number is very 
useful to deflate national income on the basis of fixed price. It gives 
true knowledge about the changes in actual income of people. We 
do not have the knowledge about actual rate of growth of country or 
actual improvement in the living standard of the people from national 
or per capita income on current price but when national or per capita 
income is deflated or adjusted on fixed price then eliminating the 
effect of inflating and deflating through it nation can get the 
knowledge about real economic condition. Similarly the real wages 
can be obtained by deflating money wages through consumer’s price 
index number. It gives the knowledge about the real position of 
labours. 

(5) Index numbers act as economic barometer : As the 
barometer measures the atmospheric pressure in the same way 
index number measures the economic and business pressures of 
finance management of a country. Index numbers of price, 
production, foreign exchange, bank deposits etc. tell the variations in 
the business activities of a country. Actually index number measures 


the pulse of finance management of a country. Therefore index 
numbers are called economic barometers to measure the finance 
management of a country. Simpson and Kafka rightly stated, “Index 
numbers are to day one of the most widely used statistical devices ... 
they are used to take the pulse of the economy and they have come 
to be used as indicator of inflationary or deflationary tendencies.” 


IMPORTANCE OF INDEX NUMBERS 


At present, the index numbers have very practical significance. 
The present’s economic conditions are not only estimated by it but 
forecasting may be also done for the future. Following are the main 
merits or utilities of the index numbers : 


(1) Simplify the complex facts : Data are expressed in the form 
of percentage through index numbers so the comparison of complex 
facts becomes easy, for example, if we have to measure the 
changes of cost of living in explicit form then we will have to study all 
the expenditures of cost of living which is a very complex 
phenomenon but it can be measured easily by cost of living index 
number and changes can be determined. 

(2) Handy in comparative study : Comparative study of changes 
of various events can be made through index numbers. Index 
numbers measure the relative changes of facts so comparison of 
events can be done easily with respect to time and place. 

(3) Measure of changes : Index number measures the changes 
over times or places. The changes whose explicit measurement is 
not possible, index number is the only statistical device which makes 
possible to measure those changes. 

(4) Helpful in deciding the salary, dearness allowance etc. : 
Information about the changes in price of consumer’s commodities is 
obtained through consumer price index number and salary, dearness 


allowance etc. of government employees are decided on its basis. 
Minimum wages and dearness allowances etc. are decided in the 
industries on these bases. 

(5) Estimation of changes in national income : Estimation of 
changes in national income is possible with the help of index number 
and various government policies are prepared on this basis. 

(6) Utility for businessman : The businessman can estimate 
future fluctuations on the basis of price index numbers of the 
commodities. Thus index numbers are extremely useful for the 
businessmen. . 

(7) Analysis of time series : Time series can be analysed easily 
by using index numbers on this basis, the study of long-term and 
short-term movements in the phenomenon becomes easy. 

(8) Knowledge of changes in production : The changes in the 
production can be measured by the index numbers relating to the 
production. On this basis government can easily decide to which 
industries the encouragement and government aid should be given. 

(9) Knowledge of foreign trade : The important information’s 
about import and export can be obtained in our country by index 
numbers relating foreign trade. Which is beneficial to increase 
foreign currency exchequer of our country. 

(10) Other advantages : Various advantages are obtained by 
index numbers. e.g. , policies relating to price stability are made on 
the basis of wholesale price index numbers. Control of the quantity 
of money or credit deflation etc. are done by measuring the 
purchasing power of money. Index numbers are used to decide the 
policies in the LIC and rate of interest in the banks. 


LIMITATIONS OF INDEX NUMBERS 


Despite the usefulness of index numbers, it is not possible to use 
them for studying all variations in every sphere of life. They also 
have some limitations which are given below : 


(1) Only estimation : Index numbers give only indication of any 
event or fact and do not tell about real situation. Only relative 
estimation of the changes can be possible on their bases. The 
changes in individual units can’t be studied on their bases. The 
conclusions obtained by index numbers are true on the average. 

(2) Lack of accuracy : Since some selected representative units 
are included only in index numbers not all hence there may be lack 
of sufficient accuracy. If sufficient cares are not taken into account at 
the time of selection of base year or representative commodities, 
conclusions may be fallacious. 

(3) Different in purposes : Index numbers are constructed 
separately for different purposes. Index number constructed for one 
purpose may not be used for another. 

(4) Negligence of qualitative changes : No attention is given 
towards the quality of the commodities at the time of construction of 
index numbers. They do not reveal the qualitative changes of the 
commodities. 

(5) Limitations of the averages : The limitations of the averages 
which are used in the construction of index number may also affect 
the index number. 


PROBLEMS IN THE CONSTRUCTION OF INDEX 
NUMBERS 


We have to face the following problems while constructing the 
index numbers : 

(1) To define purpose of the index number 

(2) Selection of the base year 

(3) Selection of commodities and services 


(4) Selection of the prices of commodities and services 
5) Choice of an appropriate average 

6) Selection of the appropriate weights 

7) Selection of an appropriate formula 


( 
( 
( 
(1) To define purpose of the index number : Since index 
number are of various types and fulfil the various objects, before 
constructing index numbers their purpose must be defined as 
precisely as possible. What the index number is to measure and why 
? It should be clearly decided. For example, purpose of price Index 
number is to measure the changes in the prices of commodities of 
consumers. So at the time of constructing such price index number 
wholesale prices must not be included but retail prices which are 
directly related to consumers should be selected. If the purpose of 
price index number is to measure the changes in living standard of 
poor families, special care should be taken not to include the 
commodities used by middle class and upper income class. If we are 
not clear about the purpose of index number, it would lead to 
confusion and wastage of time and means. The other problems of 
construction of index number such as selection of base year, 
selection of commodities, prices of commodities etc. are decided on 
the basis of the purpose of the index numbers. 

(2) Selection of the base year : Since index number measures 
the relative changes of a phenomenon, a base year or a reference 
period is selected while constructing an index number. The changes 
in the price of current year are measured by index number assuming 
the prices of this base year as 100 and they are compared with base 
year’s index number 100. Though the selection of a base year 
mainly depends upon the purpose of index number, yet the following 
facts should be taken into account in the selection of base year : 


(i) The base year should be normal : The base year should be a 
period which is free from all sorts of abnormalities like more 
fluctuations in the prices, inflation, money depression, war or famine. 
Since selection of so base year, which is normal in above all 
respects, is very difficult hence in order to remove this difficulty an 
average of 3 or 5 years may be taken as the base year. Due to it the 
effect of last ends is diminished. 

(ii) The base year should not be too far in the past : If base 
year is chosen from the years too distant in the past, the correct 
conclusions may not be obtained. Price index number was being 
constructed in India by assuming the year 1970-71 as the base year 
in the past. Since in the recent (1989-90) the difference from base 
year is 19 years, the year 1970-71 is not an appropriate base year. 
Recently price index number is being constructed by assuming the 
year 1980-81 as the base year. This index also has become 
impractical now. So index number should be constructed by 
assuming year 1990-91 as the base year. Actually it will be more 
proper and practical to assume the previous year as the base year 
for the current year. 

(iii) Selection of fixed base year or chain base year : While 
selecting the base year an important problem we have to face 
whether the base year should be fixed or changing. In the fixed base 
method we have a fixed year for base. Assuming the prices as 100 
in this year and determining the indices of prices for other years, the 
comparison is done with price index 100 of the base year. On the 
other hand, in the chain base method the value of current year is 
compared with the value of the preceding year and not with the fixed 


base. But which method should be adopted ? To some extent, this 
Question depends upon the purpose of index number. 

(3) Selection of the commodities and services : While 
constructing an index number it is important to decide what 
commodities and services to include. This depends upon the 
purpose of the index number. Since an index number can't be 
constructed by including each commodity and service and it is 
neither practical nor adequate. Hence only representative 
commodities should be selected while constructing an index number. 
Representative commodities are those commodities which are 
mostly used by the people in their daily life and which represent the 
tastes, habits and customs of the people for whom the index number 
is to be constructed. For example, while constructing the consumer 
price index number for working class to select items like scooters, 
motor cars, refrigerators etc. is impractical. While selecting the 
representative commodities, their qualities and grads should be also 
taken into account. If without considering the grade and quality of the 
commodities, the index numbers are constructed for different 
commodities and their prices at different times, the confusing results 
would be obtained through it. Hence standardized commodities and 
services should be included in the form of representative 
commodities and _ services. Regarding the selection of the 
commodities, the second important question is how many 
commodities should be included. So far as the question of the 
number of commodities is concerned no hard and fast rules can be 
prescribed. But their numbers should be between 20 to 50 as we 
know, “Larger the size of the sample, more accurate will be the 
result.”. The number of commodities should, therefore, be 


reasonable, say 50. Recently the Economic Advisor of our country 
has divided the commodities included in wholesale price index 
number into three main categories. First group— The primary 
commodities, whose number is 80, are classified into two sub- 
groups. Second group—fuel, power, light and petroleum products, 
whose number is 10, are classified in one group and third group 
manufactured commodities of industries whose number is 270, are 
recorded into 11 groups. This new classification is based on 
standard industrial classification accepted by international level. 

(4) Selection of the prices of commodities and services : After 
the selection of commodities and services the next important 
problem is to obtain suitable price quotation for these commodities. 
Generally there are three types of prices in the market— Wholesale 
prices, retail prices and controlled prices. Selection of suitable prices 
out of these prices depends upon the purpose of the index number. If 
consumer price index number is to be constructed, retail prices will 
be suitable because consumers are not related with wholesale 
prices. Similarly the selection of controlled prices is also not suitable 
because the commodities on controlled price offered by the 
government from controlled shop could not be available to the 
consumers according to their needs. So these things should be 
taken into account while’ constructing the index number and 
selecting the prices. 

(5) Choice of an appropriate average : Since the index number 
is a special type of average which measures the comparative 
variations of a phenomenon on the basis of time or place so it is a 
natural question which particular average, /.e. , arithmetic mean, 
median, mode, geometric mean or harmonic mean should be 


selected for the construction of the index number. Mode and 
harmonic mean are almost never used in the construction of index 
numbers. Theoretically speaking, geometric mean is the best 
average in the construction of index numbers because of the 


following reasons : 

(a) Geometric mean measures the relative changes and since the 
index numbers are constructed to measure the comparative 
variations hence geometric mean should be used. 

(b) Geometric mean like simple arithmetic mean does not give 
more importance to the largest values of the series and is not 
affected by the extreme values. 

(c) The index number constructed on the basis of geometric mean 
fulfil the time reversibility test hence geometric mean should be used 
in the construction of index number but in practice the simple 
arithmetic mean is widely used in the construction of index numbers. 
This is for 
the reason that the calculation of arithmetic mean is very simple as 
compared to geometric mean. 


(6) Selection of the appropriate weights : There is also an 
important place of weighting of the commodities in the construction 
of the index numbers. All commodities included in index number are 
not of equal importance. In practical life we see that the commodities 
included in the construction of indices are of different importance. 
For example, there is more importance of wheat and rice than salt 
and match for a consumer. 

In the production, the manufacturing of cotton clothes is more 
important than that of television. Therefore, the commodities of daily 
life are weighted according to their importance while constructing the 
index numbers. The use of weight in the construction of index 
number should be logical and rational. The use of proper and 


rational weights depends upon the purpose of index numbers and 
the available amount of items or commodities. Under rational 


weighting method, the weights of the commodities are assigned on 
the following basis : 

(1) Quantity of production of commodities and their prices. 

(2) Quantity of consumption of commodities and their prices. 

(3) Quantity of saled or available commodities for saling or their 
prices. 

Theoretically in the construction of cost of living index numbers, 
the Quantity of consumed commodities and their proportional 
expenditure are considered as proper weight. For general price 
index numbers, produced quantity of commodities or Quantity of 
commodities available for saling or weight proportional to the value 
are proper and rational weights. When the weights are assigned on 
the basis of Quantity, it is said to be Quantity weighting and when the 
weights are assigned on the basis of value, it is said to value 
weighting. 

There may be following two methods of assigning weights : 

(i) Explicit and implicit weighting : The method of assigning 
weights to the commodities may be of two types explicit or implicitly. 
Under explicit weighting method the weight of each commodity is 
decided on the basis of production, consumption or quantity of saling 
or its value to explain the importance of the commodities. On the 
other hand under implicit weighting method, when more importance 
is given to a commodity then its various varieties are included. For 
example, if wheat is to be given triple importance as other 
commodities then three varieties of wheat as against one of the 
other commodities may be included. The method is a little used. 

(ii) Fixed and fluctuating weights : When the weights decided 
once are used for many years, they are said to be fixed weights. On 
the other hand when the weights of the commodities vary with time 
then they are said to be fluctuating weights. The fluctuating weight is 
more suitable for the construction of indices because the variations 


in relative importance of the commodities are measured by it. 


(7) Selection of an appropriate formula : Various formulae have 
been used by statistical experts for constructing the index numbers. 
In such situation, the selection of the most appropriate formula is an 
important problem. 

Prof. Irving Fisher has suggested an ideal formula by testing 
more than 100 formulae. Prof. Fisher’s formula is said to be ideal 
formula because it satisfies both time reversal test and factor 
reversal test. But there are also some limitations of Fisher’s formula 
which develops many problems in practical uses. In such situation 
there is no one particular formula which can be regarded as the best 
under each circumstances for constructing indices. So the selection 
of suitable formula from various formulae depends upon the purpose 
of index number and the nature of available data. 


CONSTRUCTION OF INDEX NUMBERS 


There are various methods of construction of index numbers. We 
may classify them in the following table for the simplicity of the study 


Methods of Construction of Index Numbers 


Index No. of only one item Index No. of a group items 
Fixed Base Index No. Chain Base Index No. Simple Index No. Weighted Index No. 
Simple average of relatives Simple aggregative method Other methods 
Fixed Base |.No. Chain Base |.No. Laspeyre’s method 
Passche’s method 
Fixed Base |. No. Chain Base |.No. Darbish and Bowley’s method 
Marshall-Edgeworth method 
Cost of living |. No. Walsh’s method 
Kelly’s method 
Family Budget Method Aggregative Expenditure Method Fisher’s ideal method 


FIND INDEX NUMBER OF ONLY ONE ITEM 


Index numbers can be calculated in the following two ways on the 
basis of prices of various years of a commodity : 


(1) Fixed Base Index Numbers 

(2) Chain Base Index Numbers 
Fixed Base Index Numbers 

We can take two types of fixed base for the construction of fixed 
base index number : 

(i) A particular time or a yearly base 

(ii) In the form of average or multi-yearly base 


(1) A particular time or a yearly base : Under this method a 
particular year is taken as a base year for the whole study period 
and its index is considered as 100. The variation of the subsequent 
years are expressed in the form of percentage of the prices of this 
base year. The formula for finding the index number by this method 
is : 

Price Relative Index Number = “tscxarine *! 

(2) In the form of average or multi-yearly base : Under a 
particular base year if the selected base year is not a normal year 
then the conclusions drawn on its basis do not present the real 
situation. Also it is difficult work to decide whether the selected year 
is normal or not. In order to remove this difficulty the average of few 
years is taken as the base year. This type of average minimize the 
effect of fluctuations. For this, the variations of subsequent years are 
expressed in the form of price relatives by assuming the index 
number of average price of three or five years as 100. The formula 
for finding it is : 


The price of current year 100 


Price Relative Index Mhamber ={———=—_——__— ee 
Amerage of three (or five} years 


Illustration 1. 
Prepare price index number for all years, taking 2005 as base year from the 
following data : 


Year 2005 2006 2007 2008 2009 2010 
Price (in © ) 120 140 150 165 175 240 


Solution % Price index number of current year by taking 1990 as 
base year 
_ Price of current vear 


= : j (100 
Price of base year ‘ 


Year Price Calculation of Price Relative 
Price relative Index No. 

2005 120 — 100 (Let) 

2006 140 i" 116.7 

2007 150 i” 125.0 

2008 165 i” 137.5 

2009 175 1” 145.8 


2010 240 1" 200.0 


Illustration 2. 

Calculate price relative index number by taking average price year 2002 to 2004 
as base : 

Year 2002 2003 2004 2005 2006 2007 2008 2009 

Price (in © ) 120 140 160 155 165 180 190 195 


Solution % 
Average price of the years from 2002 to 2004 


_ 120+140+160_ 420_ 


3 


140 


Year Price Calculation of Price Relative 
Price Relative Index No. 


2002 120 in” 85.7 


140 


2003 140 i” 100.0 
2004 160 in” 114.3 
2005 155 in” 110.7 
2006 165 in” 117.9 


180 


2007 180 10" 128.6 


190 


2008 190 i0* 135.7 


2009 195 in” 139.3 
Illustration 3. 


Construct price index number taking three years average as base from the 
following data : 


Rate per rupees 
Year Wheat Rice Sugar 


First Year | 2 kg 1.0 kg 0.4 kg 
Second Year II 1.6 kg 0.8 kg 0.4 kg 
Third Year Ill 1 kg 0.75 kg 0.25 kg 


Solution % 


In this Question, rate per rupee is given so in order to find price 
index, we shall have to find price per kg. 


Price per kg. 

Year Wheat Rice Sugar 
| per kg. 0.50 1.00 2.5 
Il per kg. 0.625 1.25 2.5 
Ill per kg. 1.00 1.33 4.00 
Total 2.125 3.58 9.00 
Average 0.71 1.19 3.00 

Construction of Price Index Number 
First Year | Second Year Il Third Year Ill 


Comm.- Average Price Price Price Price Price Price 
odity Price Relative Relative Relative 


Px 100 


(Po)(P1) "(p22)" (p3)r 
Wheat 0.71 0.50 70.4 0.62 88 1.00 140.8 
Rice 1.19 1.00 84.0 1.25 105 1.33 111.8 
Sugar 3.00 2.50 83.3 2.50 83.3 4.00 133.3 
Total N—— — 237.7 276.3 385.9 


Average 79.3 92.1 128.6 
Thus, 


Price Index Number : First year = 79.3 


Second year = 92.1 
Third year = 128.6 


Chain Base Index Number 

There is no fixed base year in chain base index number. Under it, 
the indices of a year are constructed on the basis of immediately 
preceding year. For example, the year 1994 will be base year for the 
indices of year 1995, year 1993 will be base year for 1994 and 1992 
will be base year for 1993. Thus for chain base index numbers, the 
base year (preceding year) changes with every current year. 

Merits : 

(1) By constructing the indices through this method, the direction 
and the degree of the changes in current year as compared to 
previous year may be obtained. 

(2) Under this method new items can be included and some old 
items can be excluded. So the information about the changes in 
habits, fashion, demand of the people may be obtained, but this 
method is not good for studying the long time changes. 

The formula to determine chain price relative is : 


Price of current year 


Chain Rnice Relative = x 100 


Price of preceding year 


(1) The price of every year is expressed in the form of percentage 
of the price of preceding year. The quantity which is obtained from it, 
is called link relative. 

(2) Adding chain price relatives of each year, it is divided by the 
number of commodities. Thus average of chain price relatives is 
obtained by it. 

(3) The average of these chain price relatives tells the percentage 
ratio between two periods. A chain is constructed with the help of 
chains through average of these chain price relatives. The changes 
of all years are chained with first year. Thus chained price relatives 
are called chain indices chained to a common base. For finding it, 
the formula is : 


Chained Index Number of Current Year = (Chained index number 
of preceding year x average chain price relative of current year) x 00 


Illustration 4. 
Find price index number by chain base method from the following data related 
to wholesale price of wheat for ten years : 


Year Price of wheat per quintal (in ~ ) 
2001 50 
2002 60 
2003 62 
2004 65 
2005 70 
2006 78 
2007 82 
2008 84 
2009 88 
2010 90 


Solution % 
Year Price of wheat Chain Price Relative Chain Base Index 


No. 
2001 50 100 100 


2002 60 3 %100=120 a = 120 
2003 G2 sortr=r0sss TSN? = 124 
2004 65 33% 100=104.84 mae xe = 130 
2005 70 5% 100=107.69 — ze Sth 
2006 78 7p 100=111.43 KM = 156 
2007 BQ sertoonrosas OAT = 164 
2008 84 52% 100=102.44 sea 164 = 168 
2009 88 5g <100=104.76 _ 168 _ 176 
02.27 x 176 


2010 90 3% 100=102.27 100 = 180 
Base Conversion 
From fixed types of base conversion : 
(1) From fixed to chain base (2) From chain base to fixed base 
(1) Form fixed base to chain base : For conversing fixed base 


index number into chain base index numbers, the index number of 


first year is assumed as 100. For the current years, we assume the 


fixed base index number of previous year as a base. 
Formula for finding it : 


Fired base index of the currerd year 


Clem base index mT of current. v = 
STEAL IRES TOE RTE CTE Ae Poed base index of the precedirr year 


«lOO 


Illustration 5. 


Change the fixed base index number to chain base index numbers from the 
following data : 


Year 2005 2006 2007 2008 2009 
Fixed Index No. 100 105 95 115 102 
Solution % 
Calculation of Chain Index Number 


Year Fixed Base Index No. Chain Index No. 
2005 100 100 


105 


2006 105 1008 ee 
2007 95. ins*19-9° 
2008 115 oe x100=121 


10 


2009 102 15-8 


(2) Form chain base to fixed base : For conversing the chain 
base index number into fixed base index number (i) Fixed base 
index number of first year remains the same (ii) Calculation is done 


in the following way for the other years : 
Fixed base index number of current year 


Chain base imdesrof current year « Fised base indesrof previous year 
= 100 


Illustration 6. 


From the Chain Base Index Numbers given below, prepare Fixed Index 
Numbers : 


Year 2006 2007 2008 2009 2010 
C.B. Index No. 80 140 130 100 90 


Solution % 


Current year Chain Index 
. X Pre. year FBI. : 
Year Chain Base Index No. 100 Fixed Base Index No. 
2006 80 — 80 


2007 140 **iw 112 


130 


2008 130 '*i0 145.6 


145.6 x100 


2009 100 im 145.6 


2010 90 “ww 131.04 


Note : The base year is not given in the question. So to change chain base Index 
No. into fixed index No, 80 will be assumed. But if 2006 is assumed as base 
year, the FBI No will be 100 which has been cleared in the next example. 


Illustration 7. 


From the Chain Base Index No.’s. given below, prepare Fixed Base Index No.’s, 
if 2004 year is taken as base year : 


Year 2004 2005 2006 2007 2008 2009 2010 
C.B. Index No. 300 240 220 300 280 180 260 


Solution % 
Conversion to Fixed Base Index No.’s from Chain Index No.’s 


Year C.B. Index No.’s Conversion Fixed Base Index No’s 
2004 (Base year) 300 — 100 


100x240 


2005 240 “iw 240 


2006 220 “iw 528 


528x300 


2007 300 “ww 1584 


1584x280 


2008 280 “ww 4435.2 


2009 180 “iw 7983.4 


7983.4x 260 


2010 260 “iw ~20756.7 


Illustration 8. 


A Chain Base Index No. was at 100 in 2006. It increased 20% in 2007, declined 
5% and 20% in years 2008 and 2009 respectively and rose by 30% in 2010. 


Calculate the Fixed Base Index No.’s for 5 years with 2006 as base year. 


Solution % Calculation of Fixed Base Index No.’s 


Year Chain Base Index No.’s Conversion Fixed Base Index 
No.’s 
2006 (Base year) 100 — 100 


100x120 


2007 120 “ww 120 


2008 95 “iw 114 


11480 


2009 80 “iw 91.2 


91.2x130 


2010 130 “iw 118.6 


Exercise (A) 


1. From the following data, construct index for each year by taking the price (in ~ 
) of a commodity A with 2006 as base : 


Year 2004 2005 2006 2007 2008 2009 2010 
Index No. 108 134 156 112 144 204 196 


[ Ans. 69.23, 85.90, 100, 71.79, 92.31, 130.77, 125.64] 
2. The following are the prices of an item : 
Year 2007 2008 2009 2010 


Price (in © per quintal) 216 220 200 218 
Calculate index numbers for 2007, 2008 and 2010 based on 2009. 


[ Ans. Index of 2007 = 108, Index of 2008 = 110, Index of 2010 = 109] 

3. The production cost of wheat for 5 years is given in the following table. 
Calculate the index numbers taking average cost as base : 

Year 2006 2007 2008 2009 2010 

Price ( * per ten kilo) 4.00 5.50 6.75 8.25 10.50 

[ Ans. 57, 79, 96, 118, 150] 

4. Calculate chain relative from the following data : 


Year 2006 2007 2008 2009 2010 
Price 175 200 250 300 280 


[ Ans. 100, 114, 125, 120, 93] 
5. Prepare fixed base index numbers from the chain base index numbers given 
below : 


Year 2005 2006 2007 2008 2009 2010 
Chain Base Index 110 160 140 200 150 175 


[ Ans. 110, 176.0, 246.4, 492.8, 739.2, 1,293.6] 
6. Form the chain base index number given below, prepare fixed base index 
numbers : 


Year 2005 2006 2007 2008 2009 2010 
Chain Base Index 92 102 104 98 103 101 


[ Ans. 92, 94, 98, 96, 99, 100] 
7. From the following fixed base index numbers based on 2005 prices, construct 
chain base index numbers : 


Year 2005 2006 2007 2008 2009 2010 
Index No. 100 110 105 120 130 150 


[ Ans. 100, 110, 95.4, 114.3, 108.3, 115.4] 

8. From the following fixed base index numbers based on 2004, calculate base 
indices : 

Year 2005 2006 2007 2008 2009 2010 

Fixed Base Index No. 105 79 56 59 56 50 

[ Ans. 105, 75.2, 70.9, 105.4, 94.9, 89.3] 

9. From the following fixed base index numbers calculate chain base index 
numbers : 


Year 2005 2006 2007 2008 2009 2010 
Fixed Base Index No. 376 392 408 380 392 400 


[ Ans. 100, 104.26, 104.08, 93.14, 103.16, 102.04] 
10. From the following data, calculate chain base index numbers : 


Year 2005 2006 2007 2008 2009 2010 
Production 10 12 15 18 27 43.2 


[ Ans. 100, 120, 125, 120, 150, 160 ] 
11. From the fixed base index numbers given below, prepare chain base index 
numbers with 2006 as base : 


Year 2006 2007 2008 2009 2010 
Fixed Base Index No. 120 130 140 150 180 


[ Ans. 100, 108.3, 107.8, 107.1, 120] 


INDEX NUMBER OF GROUP OF ITEM 


Equal importance is given to all items in it. It is constructed by two 
methods : 

(1) Simple aggregative method : Under this method the sum of 
commodity prices in the current year is expressed as a percentage 
of the sum of the price for the same commodity in the base year. 
Thus formula for price index number of current year : 

i 
a L100 


Poy 


where P gq 1 = Price index number of current year 


x» = Sum of prices of current year 


=», = Sum of prices of base year 


Index number of base year is assumed 100, therefore, rotio is 
multiplied by 100. 


Illustration 1. 


Calculate price index number for year 2011 taking the year 2001 as base of the 
following data : 


Commodity Price in year 2001 Price in year 2011 
A ~ 250 per quintal ~ 350 per quintal 

B * 200 per quintal * 325 per quinatal 

C * 500 per quintal * 800 per quintal 

D * 400 per quintal ~ 700 per quintal 


E * 300 per quintal * 500 per quintal 
Solution % 
Price Index Number by simple Agregate Method 


Commodity Price in 2001 Per Quintal ( P 9 ) Price in 2011 per 


Quintal ( P 4 ) 


A 250 350 
B 200 325 
C 500 800 
D 400 700 
E 300 500 


Total =» = 1650 » = 2675 
Price Index Number for the year 2011 


=P, 2675 
Por = Sp * 100 == x 100 = 162.12 
af0 650 


The price index number of year 2001 is 162.12 on the basis of 
price index number 100 of year 2001. It means that prices are 
increased by 62.12% in 2001 as compared to 1991. 


Limitations : Though the calculation of index number by simple 
aggregative method is very simple but there are also few limitations 
of this method. These should be taken into account while 


constructing the index number through this method : 

(i) For constructing the index number of this method it is essential 
that the prices of all the commodities should be expressed in the 
form of same unit. If the prices of different commodities are 
expressed in different units then price index number will be 
fallacious. In the above example, the prices of different 
commodities are given the form of per quintal but if the prices of 
some commodities are quoted in per quintal and prices of some 
commodities are quoted in per kg. then price index number will 
be different. 

(ii) This index number is influenced by the magnitude of the prices 
also. The higher the price of commodity the greater is its 
influence on the index number. 


(iii) The relative importance of various commodities are not 
accepted in the form of weights so this price index number does 
not the real situation. 


(2) Simple average of price relatives method : Under this 
method, the current year’s price of each commodity is expressed as 
price relative of the base year’s price by assuming the price of base 
year as 100. Then an average of these price relatives is determined. 
This simple average of the price relatives is the price index number 
of the current year. This average may be simple arithmetic mean of 
simple geometirc mean. The formula for computing this index 


number is : 
Formula: 
Frice Index No. of Current Year = eee eee 
O r Number of Commodites 


In this formula 
P 94 = Index No. of prices of current year P g = Price of base 


year 
P 4 = Price of current year N = Number of commodities 


Jeg 
<1 |x100 


R = Price Relative |” 
Note : If geometric mean is used for the construction of index 
number then following formula will be used for the index number : 


Dlog E x 100| Antilog [= log 7 x 100| 
) = 
N = N 


log Py = 


Slog Rk 
N 


A 
Py) =Antilog Here, R-| 7,100 | 


Illustration 2. 

Calculate price index number using simple average of price relatives method 
from the following 
data : 


Commodity ABCDEF 


Price in year 2001 (in © ) 20 30 10 25 40 50 
Price in year 2011 (in © ) 25 30 15 35 45 55 
Solution % 


Computation of Price Index Number by Simple Method of Price 
Relatives 
Commo- Price in year Price in oe Price Relative 


dity 2001 (in: ) 2011 (in* ) =" log R 
(Po )(P4)(R) 
A 20 25 3" 2.0969 
B 30 30 5" 2.0000 
C1015 i” = 150 2.1761 
D 25 35 25"-"" 2.1461 
E4045 i= 9.0512 
F 5055 3°=" 2.0414 
Total N = 6 sx = 737.5 »« 12.5117 
Applying the formula : 


P 400 
Po _l 4 =R 
N 


or « 


Substituting the values, *~ S = 122.92 approximately 

Answer : Price index number for 2011 on the base of 2001 is 
122.92. 
Using Geometric Mean 

Formula : =! ox" 


Substituting the values, *-*""" > 
m = Antilog of 2.0853 
™ = 121.7 


Answer : Price index number of year 2011 on the basis of year 
2001 is 121.7 by using geometric mean. 


Merits and limitations of simple average of price relatives 
method : This method has the following three advantages over 


simple aggregative method : 
(i) It is not influenced by the units in which prices are quoted. 
(ii) It gives equal importance to all items. Hence it is not affected by 
extreme items. 
(iii) The index number determined by this method satisfies the unit 
test. 
But this method has two limitations also : 
(a) Since it is unweighted average hence the importance of all 
items in index number is assumed to be the same. 
(b) Index number determined by this method does not satisfy the 
various criteria of Fisher’s ideal index number. 


WEIGHTED INDEX NUMBER 


Under weighted index numbers various items or commodities are 
assigned by weights according to their importance. Weighted index 
numbers are also of two types : 

(A) Weighted aggregative index number 

(B) Weighted average of relatives index number 


(A) Weighted aggregative index number : The basic difference 
between simple aggregative and weighted aggregative method is 
that in latter the various items or commodities are really assigned by 
weights. In weighted index numbers, quantity of produced or saled or 
consumed commodity may be used as weight. Various formulae are 
produced by different experts to find weighted aggregative index 
numbers. Some of important methods are given below : 


(a) Laspeyre’s method (b) Passche’s method 

(c) Dorbish and Bowley’s method (d) Marshall-Edgeworth method 
(e) Walsh’s method (f) Kelly’s method 

(g) Fisher’ ideal method 


(a) Weighted aggregative index number by Laspeyre’s 
method : Laspeyre introduced this method in 18771. In this method, 


quantities of the base year are used as weights. According to this 
method, the following formula is used to find index number : 


= 
mt (Qi) 


In the formula, 
P 94 = Price index of the current year 


P 4 = Price of the current year 

P g = Price of the base year 

q g = Quantity of the base year 

=x = Sum of the products of the Quantity of base year and price of 
current year 

2am = Sum of products of quantity of base year and price of base 
year. 

(b) Weighted aggregative index number by Passche’s method 
: German statistician Passche used this method in 1874 for the first 
time. Under this method, the quantity of the current year is used as 
weight for finding weighted index. According to this method the 
following is the formula to find index number : 


In the formula 

P 94 = Price index of current year 

xn = Sum of products of current year’s price and current year’s 
quantity 

*~1 = Sum of products of base year’s price and current year’s 
quantity 


(c) Dorbish and Bowley’s method : Under this method, the two 
methods of Laspeyre and Passche are combined for finding index 
numbers. If we take the arithmetic mean of Laspeyre’s and 
Passche’s index numbers, we would find the index number of current 
year according to Dorbish and Bowley. Under this method quantities 
of both base year and current year are used as weight. Following is 


the formula for finding index number according to this method : 
ZPido =Pig 
Pog 2Pon op. L+P 


? 


r4 a 


Foy 


where, L = Laspeyre’ s Index Number 

P = Passche’s Index Number 

This method is not popular because more calculation work is to be 
done in it. 


(d) Marshall-Edgeworth method : Under this method the sum of 
quantities of both current year and base year is used as weight. 
According to this method, the formula for finding index number is : 


ZAqgt+ 24 
100 or SAO ED 


=P (@q +41 ) “ 
= ey en cei) = ee = 
Pogo +4} 40099 + £041 


D a 
Rn, = oo 
*Ol ois 


(e) Walsh’s formula : Walsh has emphasized specially on the 
use of concept of geometric mean instead of the concept of 
arithmetic mean crossed in Marshall and Edgeworth’s weighted 
aggregative index number. According to this method, the formula for 


finding index number is : 


, _~Pivgon 


rho 9041 


(f) Weighted index number of Kelly’s method : Truman L. Kelly 
has used the following formula for constructing the index numbers : 


In this formula g means the quantities of the year which is 
selected as the base. It may be any year not necessarily base year 
or current year. Thus the fixed weighted aggregative is used in this 
method so this method is known as fixed weight aggregative index. 

(g) Fisher’s ideal index number : Prof. Irving Fisher has 
produced one formula for index numbers after deep study of more 


than 134 formulae. It is known as Fisher’s ideal index number. 
Fisher’s ideal index number formula is : 


SR a TRe 
aeido EEG. 1000 Ry, =V¥ExP 


01 = 1, sD ol = 
2Frgo Son 


where, L = Laspeyre’s index number 
P = Passche’ s index number 


Thus Prof. Fisher’s ideal index number is the geometric mean of 


Laspeyre and passche’s index numbers. 

Fisher’s index number is known as ideal index number because of 
four reasons : 

(i) It is based on the geometric mean. The use of geometric mean 
is considered to be the best average for constructing index number. 

(ii) It is based on variable weights. The quantities of both base 
year as well as current year are used as weights in it. 

(iii) It satisfies both reversal tests of index number-time reversal 
test as well as factor reversal test. 

(iv) It is free from bias. Adverse biases associated with Laspeyre’s 
index and Passche’s index are cancelled by using geometric mean 
in Fisher’s formula. Thus Fisher’s formula is free from bias. We have 
to do the following work for fmding index number by Fisher’s ideal 
formula : 


(i) Multiplying the base year’s price ( P g ) and base year’s 


quantity ( g ¢ ), of each commodity, the total of these products is 


determined. Thus we obtain =» . 

(ii) Multiplying current year’s price ( P ; ) and base year’s quantity 
(qq ), their products are added. Thus we obtain >» . 

(iii) Multiplying current year’s price ( P ; ) and current year’s 
quantity (q 1 ), their products are added. Thus == is obtained. 

(iv) Multiplying base year’s price ( P g ) and current year’s price ( 
q 1 ), their products are added. Thus we obtain >= . 

(v) After it, Prof. Fisher’s following formula is used : 


=Pigo yin 


: : x100 
=Fogo Fon 


Fo. = 


Here, P g4 = current year’s price index number. 


For using Fisher’s ideal formula it is essential that the data relating 
to prices and quantities of the commodity in each year should be 
available. If these date relating to price and quantity in any year are 
not available, construction of index number can’t be done. 


Illustration 3. 
Find out price number for the year 2011 taking 2010 as the base year by 
Laspeyre’s, Paasche’s, Bowley’s, Marshall-Edgeworth method : 
Commodity Year 2010 Year 2011 


Price Quantity Price Quantity 
A2846 
B51065 
C414510 
D219213 
Solution : Calculation of Index Number for Year 2011 
Commo- Base year 2010 Current year 2011 


dity Price Quantity Price Quantity % 2%» 41 1% 
Podgor191 


A284 6 32 16 24 12 
B5 106560 50 30 25 


C4145 10 70 56 50 40 
D 2 19 2 13 38 38 26 26 
200 160 130 103 


=Pig =Foqo 2Am on 


(i) Calculation of index number by Laspyre’s method : 


a) 200 
01 = x100 =" ¥100=125 
1" SPyq0 ico 


Price index number of year 2001 is 125. 
(ii) Calculation of index number by Paasche’s method : 
- xP, q] 
Fog) 


130 ne 
=— X100=126.21 
103 
Price index number of year 2001 is 126.21. 
(iii) Calculation of index number by Darbish and Bowley’s 


x 100 


method : 
ZAg , 2Pin 200 130 
> p> 
Fo = Fa = Fon sayy = er x100 
1.25 41.262 2.512 
= x10 = >= x100=125.6 


Price index number of year 2011 is 125.6 
(iv) Calculation of index number of Marshall-Edgeworth 


Method : 
Py STAM 499 = ZPito +2Pit 199 
=Po (qo +1! Podo + Pon 
2004130 330 
= x100 =" =15 7 
1604103 = Fygg 100+ 125.47 


Price index number of year 2011 is 125.47. 
Illustration 4. 

Find out Fisher’s Ideal Index Number from the following data : 
Comm- Quantity Price 
odity Base year Current year Base year Current year 

A 10 kg 5 kg ° 2.00 per kg ° 6.00 per kg 


B 15 kg 10 kg * 1.00 per kg ~ 4.00 per kg 


C 20 kg 10 kg * 0.50 per kg * 1.50 per kg 
Solution: 
Calculation of Fisher’s Ideal Index Number 


Base year Current year 
Commo- Price Quantity Price Quantity P97 q9P9091P190 
P1q1 
dityP9q9P1q91 

A 2.00 10 6.00 5 20 10 60 30 

B 1.00 15 4.00 10 15 10 60 40 


C 0.50 20 1.50 10 10 5 30 15 
45 25 150 85 


Total Po qgo2P90q912P1992P1914 


Applying Fisher’s Ideal Index Number formula 


ido y PN 
Yodo =o 


Py = x100 


where P 94 = Price index number of current year 

2 P 4 qq = Total of products of current year’s price and base 
year’s quantity 

2 PQ 4g = [otal of products of price of base year and quantity of 
base year 

2 P4q 4 = Total of products of current year’s price and current 
year’s quantity 

2 Pg q 4 = Total of products of base year’s price and current 


year’s quantity 
Substituting the values, 


150 _ 85 
Pi =y x”? x100=336.7 
45 25 


Answer : Current year’s Index Number is 336.7. 


Illustration 5. 
Find out Fisher’s Ideal Index Number from the following data : 


Base year Current year 
Commodity Price (per unit) Total Exp. (in ° ) Price (per unit) Total Exp. (in * ) 
A240575 
B416840 
C1102 24 
D 5 25 10 60 
(Raipur 1988; Bilaspur 2006; Gwalior 2006; Ujjain 2006) 

Solution: 

Here Quantity = Total Exp./Price 

Calculation of Fisher’s Ideal Index Number 

Commo- Base year Current year 


dity Price Quant. Price Quant. PP9qg9 P99g1P1990P1914 


P9qoP191 

A2 205 15 40 30 100 75 
B4485 16 20 32 40 
C11021210 1220 24 
D 55106 25 30 50 60 


Total 91 92 202 199 
>PgqgozPoqg12P1q902P141 


Applying the formula of Fisher’s ideal index number, 


ido. PN 
Yodo oN 


x 100 


Substituting the values 


202 199 
Py =4 a1 oy *100=219.2 


Price index number of current year is 219.2 


Illustration 6. 
Find out a suitable Index Number for comparing purpose from the following data 


Rice Wheat Sugar 


Year Price Quantity Price Quantity Price Quantity 

200145031025 

2011 10408844 

(Bilaspur 2005) 

Solution: 

Fisher’s ideal index number is the suitable index number whose 
calculation is given below : 
Commo- Base year 2001 Current year 2011 


dity Price Quant. Price Quant. PP9 g9 P90 91P190P1q 


PogoP191 

Rice 4 50 10 40 200 160 500 400 
Wheat 3 10 8 8 30 24 80 64 
Sugar 2544 10 8 20 16 


Total? PogozrPQ0q12tP1q9d02Pi44 
= 240 = 192 = 600 = 480 
Fisher’s ideal index number 


Hido xv LAN 
YPodo | Poy 


x 100 


Substituting the values, 


600 480 
Py, =,{,—-x— x100 
240° 192 


= 2.52.5 x 100 


= 2.5 x 100 = 250 

(B) Weighted average of relatives index number : Under this 
method, the price relative 2) of current year’s price is determined 
on the basis of price of base year for finding current year’s price 
index number. These price relatives are multiplied by weights of 
related commodities. Here weight of commodity means. total 


expenditure ( P g q g ) for related commodity. Total of products of 


price relatives and weights is divided by the total of weights. The 
Quotient, so obtained, is the price index number of the current year. 
The weighted arithmetic mean or weighted geometric mean may be 


used for finding their average. 
Following is the formula for finding price index number by using 
weighted arithmetic mean : 


= AA x100 Ww 
B P LRW 
or == 

01 SW 


Py = 
01 SW 


P x100 
) 


Here R means price relative, i.e. , “* 

W means weight. 

By taking base year’s quantity ( ¢ ¢ ), the weight will be W = P 9 q 
9 and if current year’s quantity is taken then weight will be W = Pq g 
1 » We will have to do the following works for finding current year’s 
index number by using weighted geometric mean : 

(i) Find price relative of current year’s price on the basis of base 
year’s price for each commodities. Use the formula «*™ 

(ii) Find logarithm of each price relative. 

(iii) Multiply each logarithm by the weight of the related commodity. 
Here product of price and quantity of the base year( Pq qq ) is 
used for the weight (W). 

(iv) Add the products of logarithms and weights and divide this 
total by the total of weights. 

(v) Find antilogarithm of the quotient so obtained. This will be the 
price index number by weighted price relative method. In this method 
the following formula is used for finding price index number : 


“(log ld 
Ww 


for it. 


Pox =Antilog| 


Pp 
“100 


Here R means price relative, /.e., * and W = weight. 


If base year’s quantity (¢ g ) is taken then W means ( P g q  ). 
On the other hand if current years quantity ( ¢ 1 ) is taken then W 
will be P 9 g 1 . In former weight is Laspeyre’s price index number 
and in latter it is Passche’s price index number. 


Illustration 7. 

Find out price index number of current year from the following data by weighted 
average of price relative method and explain the difference of their result of using 
mean and geometric mean : 

Commodity Price of base year Quantity of base year Price of current year 


A * 3.00 per kg 20 kg © 4.00 per kg 

B * 1.50 per kg 40 kg ° 1.60 per kg 

C * 1.00 per kg 10 kg © 1.50 per kg 
Solution: 

Calculation of Price Index Number by Using Weighted 
Arithmetic Mean of Price Relatives 

Commo- Base year Current year Price Relative Product of Price 
dity relative and weight 
Price Quant. Price Quant. * 

(Po )(q9)(P4)(W)(P9q909)or RW 


A 3.00 20 4.00 60 5% 100 = io am 
B 1.50 40 1.60 60 ropx100= 700 160 


1.50 


C 1.00 10 1.50 10 10" 150 x 10 = 1500 


Total > W= 130 3| F «x00 jw 
= 15900 
Applying the formula of finding price index number for base 
year, 
3 - 100} 


=RW 
P= Ww = Ww 


x100=R | 4 x100|x Pao 
\ Py 


x60=8000 


x60=6400 
5 


Substituting the values ® ~~" 


Using weighted arithmetic mean of price relatives price index 
number of current year is 122.31. In other words, the prices have 
increased by 22.31% in current year as compared to the base year. 

Calculation of Price Index Number by Using Weighted 
Geometric 
Mean of Price Relatives 
Base year Current year Price Relative Product of 
Commo- Price Quant. Price Weight z logarithm and 
dity ( W ) log of ( R ) weight 

log Rx W 

(Po)(90)(P1)(P090) 

A 3.00 20 4.00 60 3" 2.1248 127.490 


1.60 


B 1.50 40 1.60 60 150*="*" 2.0282 121.690 
C 1.00 10 1.50 10 10" 2.1761 21.761 
Total 2 W = 130 = log RW 

= 270.941 


Applying the formula of price index number for current year, 
as) 
Ww 


x100=R 


Poi =Antitog | 


270.941 } 


Po, =Antilog of | Ts 


Substituting the values, 
P 94 = Antilog of 2.0842 


P 94 = 121.38 
Using weighted arithmetic mean, current year’s price index 
number is 122.31 while using weighted geometric mean it is 121.38. 


REVERSIBILITY TEST OR TEST OF ADEQUACY 


For an ideal index number, it is essential that it should be correct 
in reversibility test. Prof. Irvin Fisher’s ideal index number satisfies 
the reversibility test. Reversibility tests are of two types : 

(1) Time reversal test (2) Factor reversal test 


1. Time Reversal Test 


Fisher stated, “Time reversal test means a test in which the 
formula for calculating an index number should be such that it will 
give the same ratio between one point of comparison and the other, 
no matter which of the two is taken as base.” In other words, index 
number calculated forwards should be reciprocal of index number 
calculated backwards. In simple words, if the index number of 
Current year’s price is determined by assuming base year as base 
and the index number of base year’s prices is determined by 
assuming current year as base then these both index numbers 
should be reciprocal to one-another and product of both should be 1. 


In the form of formula, P 94 x P49 =1 
where, P g4 = Price index number of current year, assuming the 
price of base year as base 

P 49 = Price index number of base year, assuming price of 


current year as base 


Prof. Irvin Fisher’s ideal formula of index number satisfies 
time reversal test : 


2Pi90 = Pin Po [SP.q0 , Faq 
if 


| 
Py = x 
a 1 Pogo =P : | ZPiqo Pin 


> DP, ISP YP, 
Py Pro = ene x= 191 x5 090 x= O91 =] 
=m Fog ZPig)  =Pig 


2. Factor Reversal Test 

According to Prof. Irvin Fisher’s factor reversal test, the product of 
two price index and quantity index should be the true measure of 
changes in total value, 7.e.,V 91 =P 91 x*Q Qt 

Here, V 94 = value index number 

It means *% 

P 94 = Current year’s price index number assuming base year’s 


price as base, 


“ - 
- [=F.ao  ZAD 
\ 2Fydg = 2Fogy 


Pio 


Q 91 = Current year’s quantity index number assuming base 


year’s quantity as base 


Goi = 


Yq, y 2AQ 
Fondo §=2Pido 


According to factor reversal test, 
Vo1=P01*Q 01 


-f qo, LPig [Sao TUPI 
So, Vou Fogo =Aigo MX Y=aoP0 Lor 
- PRa, tn =Pig1 

2Fy9 yao = =Fodo 


Fisher stated, “Just as each formula should permit interchange of 
two items without giving inconsistent results, so it ought to permit 
interchanging the prices and quantities without giving inconsistent 
results, /.e., the two results multiplied together should give true value 
ratio.” In simple words, the change in price multiplied by the change 
in quantity should be equal to the total change in value. 

Prof. Fisher’s ideal index formula satisfies the factor reversal test. 


3. Circular Test 

There is an other test to determine the adequacy of the index 
number formula, which is Known as circulatr test. This test is just an 
extension of the time reversal test. According to this test, if an index 
number is determined for the year 1 on base 0, and for the year 2 on 
base 1 and for the year 3 on base year 2 etc. then their product 
should be 1. 


According to formula, P 94 * P42 * P93 % ween x PN yn+4= 


According to this test, the purpose of index numbers is not only to 
compare indices of two points of time but their purpose should be to 
measure the changes in prices in a time period of different times. So 
base year should vary. Circular test emphasizes the point that the 
index numbers should be adjusted from time to time without referring 
each time to the original base. For example, if the price index 
number of 1985 was 200 on the base of year 1980 and another price 
index number for year 1980 on the base of 1975 was again 200, the 
price index number of year 1985 on the base of 1975 should be 
adjusted 400. For it, of price index number of 1985 is constructed by 
taking prices of 1975 as base then this adjusted price index number 
should be obtained 400. Lasspeyre, Paasche and Fisher’s ideal 
index number do not satisfy this test. Only the index numbers 
determined by simple aggreative method and fixed weight agregative 
method satisfy this condition. 


Illustration 8. 
From the following data prove that Fisher’s Index Number satisfies both time 
reversal test and factor test : 


Commodity Price Quantity 

Base year (in - ) Current year (in ~ ) Base year Current year 
PoP19091 

Rice 6 9 50 55 


Wheat 2 3 100 125 
Sugar 4 6 60 65 


Edible oil 10 14 30 25 
Solution : 
Calculation of Fisher’s Ideal Index Number 


Base year Current year 
Commo- Price Quant. Price Quant. PP9 gg P9091P1q90P 1 


q 1 


diyP9q90P1q917 
Rice 6 50 9 55 300 330 450 495 
Wheat 2 100 3 125 200 250 300 375 


Sugar 4 60 6 65 240 260 360 390 
Edible oil 10 30 14 25 300 250 420 350 
Total 1040 1090 1530 1610 
>PqqQg2PQq12P4q992P1q914 
Applying the formula of Fisher’s ideal index number 
Current year’s price index number Base year’s price index number 


by 
aeeuming current = as base 


1530 1610 
1040 1090 


1040 | 1090 
i X —_—. X 100. 
1530 1610 


x100 Po= 


Time Reversal Test: P 91 x P49 = 1 
Substituting the values 


ape 1610 {1040 1090 
= ,/———_x——_ x i x =] 
1040 1090 1530 1610 


Thus it is clear that Fisher’s ideal index number satisfies time 
reversal test P91 x P49 =1. 


Assuming the quantity of base year as base, calculation of 
quantity index number for current year according to Fisher’s ideal 
index number : 

Substituting the values 


1090 1610 
x 100 
1040 1580 


Factor Reversal Test 


Py X@n - eae eM j>Poa 2m 
% Zh hoa Pig 


Substituting all the values, 


Pu xQt = (2 530 1610 [1090 1610 = [1610 | 1610 1610 
1040" 1090 x fee 1530 ~ ¥1040 1040-1040 


=2P4q94/2P0QqQ=Vo1 
= Value Index Number 


Formula of Fisher’s ideal index number satisfies factor reversal 
test. Thus it is clear that Fisher’s index number satisfies both time 


reversal test as well as factor reversal test. 


Illustration 9. 

Using the following data prove that Fisher’s ideal index numbers formula 
satisfies time reversal and factor reversal test : 
2010 2011 


Commodity Price Quantity Price Quantity 

V 650 10 56 

W 2 100 2 120 

X 4 60 6 60 

Y 10 30 12 24 

Z8 40 12 36 

(Jabalpur 2007) 
Solution : 
Construction of Fisher’s ideal index number 
2010 2011 


Commo- Price Quant. Price Quant. P9q9P14q91P0q91P 14 
70 


dityPoq9P¥4 qd 4 
V 6 50 10 56 300 560 336 500 
W 2 100 2 120 200 240 240 200 
X 4 60 6 60 240 360 240 360 
Y 10 30 12 24 300 288 240 360 
Z 8 40 12 36 320 432 288 480 
Total 1360 1880 1344 1900 
Fisher’s index number for 2011 


1900 1880 S 
1360 1344 100=139.79 


Now calculation of Fisher’s index number for 2010 when base year 
is 2011. 


= : 
od wal 
x= x100' = 
; Pig TP 


1360 1344 


——— XX ]100=71.53 
1900 1880 


1900 1880 [1360 1344 
x x x 


Py, X Pio = = =a 
Now iti (ive 1344 Y1900 1880 


= 1 (Leaving the coefficient 100) 
Thus this index number satisfies time reversal test. 


Factor Reversal Test : 
Quantity Index Number = Q 94 


xt =L RAd 38 
on HG Liq __ [1844 1880 
Tan 


=Foqo 


~ ¥1360 1900 


/1344 1880 
x *,f x 
60 1844 ~ V1360" 1900 


as (Leaving the coefficient 100) 


-= = V q1 = Value Index Number 


Thus this Fisher’s ideal index number satisfies factor reversal test. 
Quantity Index Number 


As price index number measures the changes in the prices of 
commodities of current year in as compared to base year in the 
same way quantity index number measures the changes _ in 
production, distribution, consumption etc. of the commodities of 
Current year as compared to base year. Indices of industrial 
production are very useful in the form of Quantity index numbers and 
they are used as the indicators of the leval of output in the ecomomy. 
So the progress of economy may be comparatively studied by using 
the indices of industrial production. Quantity index numbers are more 
useful to measure living standard in the connection of quantity of 
production. The changes in the quantity ( q ) of the current year as 
compared to base year can be studies by constructing Quantity 
index number and price ( p ) or expenditure ( p g q g ) will be used 


as weight while constructing weighted index number. So the 
following formulae may be used for finding Quantity index number. 
(1) Aggregative method : 


. =n, 
Simple Index Number : “> 20" 


2=q1 PI 


(i) Laspeyre’s formula ; °"-=«2 


x100 
) 


yy 
*NP1 ¥100 


(ii) Passche’s formula : ° =»: 


r 


v 

»y D 
Pia xP 
9091 ~Po 


100 
Li) 


(iii) Fisher’s formula : “"~ 
(2) Relative method : 


(z a «100 | 
Simple index number @> 
[zat x100 doo | 


Weighted index number “~~ 


Illustration 10. 
Find out Quantity index number by Laspeyer’s, Passche’s and Fisher’s formula 
from the following data : 
2009 2011 


Commodity Price per unit (in © ) Value in © Price per unit (in © ) Value in ~ 
X 5 50 4 48 
Y 8 407 49 
Z6 185 20 


Solution : 


For finding quantity index number, we shall have to determine the 
quantity of commodity on the basis of price per unit from the value. 


Value of the commodity 


Qu a ntity of com mod ity = Price per unit of the commodity 


Base year 2009 Current year 2011 
Commo- Price Quant. Price Quant.0999P191P091P149 


0 


diyp9q990P191 

X 5104 12 50 48 60 40 
Y 8577 40 49 56 35 
Z6354 18 20 2415 


Total 108 117 140 90 
Laspeyre’s quantity index number 


yan 
Q), =—!™ x100 


=O) Po 


_ 140 
~ 108 


Paasche’s quantity index number 
Qo = SP 100 = 7x10 = 130 
Fisher’s quantity index number 


2q1P9 5 24/1 
~AoPo “PoP 


140 - 1 
108 90 


- {on = 429.81 


xX 100=129.63 


Qo = x100 


x 100 


Exercise (B) 


1. Calculate quantity index and price index No. using Fisher’s formula from the 
data given below : 
Commodities Base year Current year 
Price Quantity Price Quantity 
A5104 12 
B8677 
C6354 
(Indore 2007; Rewa 2005) 


[ Ans. Q 91 = 89.32, P 94 = 112.92] 


2. Following data relate to prices and quantities of some commodities. Construct 
a suitable index number and examine if it satisfies the time reversal test or not : 


Commodities 2001 2011 
Price Quantity Price Quantity 
X 60 32 50 40 

Y 35 30 40 25 

Z55 15 50 18 


8900 | S660 pe ue] 
x. x 


Ans. Index No.= 92.5; Time Reversal = (ros ane SBE Ben8 


3. Construct Fisher’s Ideal Index Number on the basis of the following 
informations : 


Commodities Base year Current year 
Price Quantity Price Quantity 

Rice 9.3 100 4.5 90 

Wheat 6.4 11 3.7 10 

Gram 5.1 5 2.7 3 


[ Ans. 49.13 ] 
4. Construct Fisher’s Ideal Index Number on the basis of the following 
informations : 


Commodities Base year Current year 
Price Quantity Price Quantity 
A2465 

B4584 

C6293 

D8162 

E10152 


[ Ans. 149.16] 


5. Construct Fisher’s Ideal Index Number from the following data : 


Item Year (2001) Year (2011) 
Price Quantity Price Quantity 
A4362 

B5464 

C7292 

D 21 80 10 15 


[ Ans. Fisher’s index number of the year 2011 is 53.36] 


6. Construct Fisher’s Index Number from the following data and show how do 
they satisfy time reversal test and factor reversal test : 


Commodities 2001 2011 
Price Quantity Price Quantity 
A4206 10 

B315520 

C225315 


D510440 
(Jabalpur 2006) 


[ Ans. 124.91, 123.45] 


7. Find out Fisher’s Ideal Index Number from the following data : 
Commodities 2001 2011 

Price Total Expenditure Quantity Total Expenditure 

A 20 30 5 70 

B 4 20 6 30 

C 10 30 3 24 

D5 10 8 32 


[ Ans. 83.48] 

[ Hint : Find Fisher’s index number. Calculate quantity for the year 2001 and 
price for the year 2011. Then find index number. 

8. Calculate weighted Aggregative Price Index Number taking 2000 as base from 


the following data : 
Commodities Quantity Unit quintal Price in base Price in current 
Consumed year per quintal year per quintal 


Wheat 4 Quintal ” * 80 © 100 

Rice 1 Quintal " ~ 120 * 250 

Gram 1 Quintal " ~ 100 ~ 150 

Pulses 2 Quintal " ~ 200 * 300 

[ Ans. 148.94] 

9. Calculate Laspeyre’s and Fisher’s Quantity Index No. from the data given 
below : 

Item Base year Current year 


Price Quantity Price Quantity 
A550 10 56 
B 3 100 4 120 
C 4606 60 
D 11 30 14 24 
E 7 40 10 36 
(Sagar 2006) 


[ Ans. 99.7 ; 100.2] 

10 . Construct Fisher’s Ideal Index No. from the following data and show that it 
satisfies the Time Reversal and Factor Reversal Tests. 

Commodities Base year Current year 


Price Quantity Price Quantity 
W 8 50 12 40 

X 4 30 8 20 

Y 10 25 12 40 

Z 550 4 100 


[ Ans. 123.93] 
11. Construct Fisher’s Ideal Inedx Number from the following data : 


Commodities 2010 2011 

Price Total Expenditure Price Total Exenditure 
W 4 80 10 150 

X 8 32 16 80 

Y 220 4 48 

Z 10 50 20 120 


[ Ans. 219.12] 
12. Find Fisher’s Index Number from the data given below : 
Commodities Base year Current year 


Price Quantity Price Quantity 
A403 602 
B 50 4 604 
C 702902 
D 203105 


[ Ans. 186.05] (Indore 2006) 
13. From the following data, what index number formula would you use for 
purposes of comparison ? Give reasons : 


Year Rice Wheat Jowar 

Price Qty. Price Qty. Price Qty. 
201045031025 

2011 10401854 


[ Ans. Fisher’s index number 222.92] (Bhopal 2006) 
14. From the following data, which index number formula would you use for 
purposes of comparison ? Give reasons : 


Year Rice Wheat Jowar 

Price Qty. Price Qty. Price Qty. 

2010 9.3 100 6.4 115.15 

2011 4.5 90 3.7 10 2.7 3 

[ Ans. Fisher’s index number 49.13] 

15. Find Fisher’s index number from the data given below : 
Commodities Base year Current year 
Price (* ) Qty. (kg) Price (~ ) Qty. (kg) 
A350 5 56 

B 11001 120 

C 260 3 60 


D 5 30 6 24 
E 4 40 6 36 


[ Ans. 139.72] 


16. Find Fisher’s Ideal Index No. with the help of following data : 
Commodities Total production in district ‘A’ Price yield in district ‘A’ 
(in thousand tones) (per quintal) 

2009-10 2010-11 2009-10 2010-11 

Rice 71 26 56 50 

Bajra 107 83 32 30 

Maize 62 48 41 28 


[ Ans. 85.4] (Sagar 2006) 
COST OF LIVING INDEX NUMBER OR 
CONSUMER’S 
PRICE INDEX NUMBER 


Cost of living index number is also known as consumer's price 
index number. The purpose of its construction is to find the direction 
and degree of changes in the prices expent by the consumers of a 
specific-group over a time period. It may not clear from the general 
price index number, what is the effect of the change in the general 


price level on the cost of living of different classes of people ? Its 
reason is that the people of different classes of society consume 
different types of commodities in different proportions and the 
changes in the prices affect them also in different manners. For 
example, the consumption pattern of rich, poor and middle class 
people varies widely. Not only this, the consumption habits of the 
people of the same class differ from place to place. Hence 
consumer’s price index number clears the effect of changes in cost 
of living of different consumers’ class residing at different places. 
Utility of Consumer Price Index Number 

Following are the utilities of consumer’s price index number : 

(1) Estimation of direction and degree of changes in 
expenditure : The direction and degree of changes in living of 
people of specific class may be estimated with the help of this price 
index number. 

(2) Control over the prices : The changes in prices may be 
controlled by the estimation of changes in the expenditure. 

(3) In deciding dearness allowances : The salaries and 
dearness allowances of labours and people of service class are 
decided on the basis of cost of living index number. 

(4) Helpful in wage negotiations and wage contracts : These 
index numbers are very helpful in wage negotiations and wage 
contracts. Under it an automatic adjustment of wage contracts is 
done on the basis of a particular unit increase in cost of living index 
number. 

(5) Helpful in deciding purchasing power of money : The 
following formula is used in deciding purchasing power of money : 


Purchasing power of money = Eons aee ee 
(6) Helpful in deciding real wage : It helps to decide the real 
wages of the people. The following formula is used for it : 


ey income 


Mon 
Real wage = Consumer price index number x109 


CONSTRUCTION OF COST OF LIVING INDEX 
NUMBER OR 
CONSUMER’S PRICE INDEX NUMBER 


The following points should be taken into account while 
constructing a cost of living index number : 


(1) Selection of consumer class : First of all it is essential to 


decide the class of people for whom the index number is to make at 
the time construction of cost of living index number. This index 
number may be constructed for industrial labours of a specific field, 
government employees, teachers or fanning labours. The specific 
class for which consumer price index number has to be constructed 
must be as far as possible homogeneous from the point of view of 
income and consumption habit. 

(2) Selection of base year : The base year should be normal as 
far as possible. There should be no up and down with respect to 
time. 

(3) Conducting a family budget inquiry : How much does an 
average family of selected consumer’s class spend on different 
commodities of consumption on an average? For knowing it, family 
budgets of some selected families should be scrutinized. Selection of 
families should be done by random sampling. The quantity of 
commodities consumed by these selected families, the prices given 


for them, the values spent on them, all should be noted down. 

In brief, the following informations are obtained by the family 
budget survey : 

(i) The average income of specific class. 

(ii) The average number of members in the family. 

(iii) The Quantity of various commodities. 

(iv) The part of the income spent on the commodities. 


(v) Items of consumers : (a) food, (b) clothing, (c) fuel and 
lighting, (d) house rent, (e) miscellaneous expenditure. 

(4) Obtaining price quotations : It is the most important and the 
most difficult task. The reason is that the retail price of the 
commodities may vary from place to place, shop to shop and 
consumer to consumer. The prices of selected commodities should 
be collected from those places from where maximum people usually 
purchase the commodities. After the collection of price quotation, 
their average should be worked out and this average price should be 
used. 

(5) Weighting : The different commodities of consumption have 
the different importance for the consumers. So the weights should be 
assigned according to relative importance of these commodities. The 
knowledge about the relative importance of commodities are 
obtained by family budget survey. 

(6) Method of constructing consumer price index number : 
Since the weights are assigned to commodities in two ways : 

(a) The quantity ( q 9 ) of commodity consumed in the base year 
is taken as weight. 

(b) The value (expenditure) ( P g q g ) of the commodity in the 
base year is taken as weight. There are two methods to construct 
weighted consumer price index number based these two quantity 


weights and value weights : 

(i) Aggregative expenditure method or weighted aggregative 
method. 

(ii) Family budget method or weighted relatives method. 

(i) Aggregative expenditure method or weighted aggregative 


method : Under this method we have to do the following work for 


constructing weighted consumer price index number: 
(a) Multiplying the quantity ( q 9 ) of commodity consumed in base 


year and price ( p g ) in base year, the total of their products 1) p g q 


Q is determined. This is the aggregative expenditure of base year. 
While multiplying the quantity ( q g ) of commodities consumed in 
base year and price ( p ¢ ) of base year, it should be kept in mind 
that their units should be the same. If it is not so then unit of quantity 
should be converted into unit of price. For example, if the price is 
given in per kg. and quantity of commodity is given in quintal then 
price and quantity should be multiplied only after converting quintal 
into kilogram. 

(b) Multiplying quantity (qo) of base year by price ( p 4 ) of current 


year, total of their product | p 4 q g should be obtained. This is the 


aggregative expenditure in the current year. 
(c) The consumer’s price index number should be calculated by 
the following : 


‘urrent year’s aggregaive expenditure 


. Cc 
C U rre nt yea r’s ] nd ex n U mM be r = __s Base year’s aggregative expenditure 


x 100 


2P19._ 190 


POs = 


(ii) Family budget method or weighted relatives method : 
Under this method, we will have to do the following work for finding 
consumer price index number : 

(a) The price relative ( R ) for price ( p 4 ) of each commodity in 
current year on the basis of its price (P g ) of base year is 


determined. The following formula is used to find price relatives : 
P rice Relative = Base a = 


A 


R — 7100 
=— ‘0 


x 100 


(b) Aggregative expenditure of each commodity is used as weight. 
So for finding weight (W), the price (p g ) of each commodity in base 


year is multiplied by its quantity ( q g ) consumed in base year. Thus 
P0940 is used as weight. . 


(c) Multiplying price relative of each commodity by its total 
expenditure ( P g q g ) weight, their total is determined. Thus |) RW 


is obtained. 
(d) Total of weights (LW ) is determined. It will be also equal to 


POYWO0: 

(e) Weighted consumer price index number is determined by the 
following formula : 

Consumer price index number of current year 


Total of products of price relatives and weights 


= Total of weights 
Po1= = 
Index numbers determined by above two methods are equal. 


Illustration 1. 

Calculate consumer price index number for the year 2011 from the following 
data taking 2001 as base by aggregate expenditure method and family budget 
method : 

Commodities Quantity consumed Price per unit in Price per unit 

in 2001 year 2001 (in © ) in year 2011 (in * ) 

A 100 8.00 12.00 

B 25 6.00 7.50 

C 10 5.00 5.25 

D 20 48.00 52.00 

E 25 15.00 16.50 

F 30 9.00 27.00 
Solution: 

Calculation of Index No. by aggregate expenditure method 

Comm.- Price of commodity Quantity of Total Exp. Total Exp. 


odity 2001 2011 commodity in in base in current 


base year year year 
PoP1qd0P090P190 
A 8.00 12.00 100 800 1200.00 
B 6.00 7.50 25 150 187.50 
C 5.00 5.25 10 50 52.50 
D 48.00 52.00 20 960 1040.00 
E 15.00 16.50 25 375 412.50 
F 9.00 27.00 30 270 810.00 


Total 2605 3702.50 
2PqqQ2P1q99 


Consumer price Index No. for year 2011 “727.""™ 
Substituting by price index’s data 


» 8702.50 
OL = AS 
2605 


<100= 142.13 approximately 


Ans. : Consumer price indedx number was 142.13 for year 2011. 
Calculation of Index Number by Family Budget Method 


Quantity Price of commodity Price Relative Weight ( W ) Multipli- 
Commo- of commo- in base in base or total cation of 


dity dity in year year R= ” x 100 exp. in price 
base base weights 
year year 


(do)(P9)(P41)(R)PQ dg or( RW) 
(W) 

A 100 8.00 12.00 =*”=" 800 1,20,000 

B 25 6.00 7.50 sw 150 18,750 

C 10 5.00 5.25 500” 50 5,250 

D 20 48.00 52.00 i“ 960 1,03,996.80 
E 25 15.00 16.50 5 *"-"° 375 41,250 
F 30 9.00 27.00 »*”-" 270 81,000 


Total 2605 3,70,246.80 


2 Wz RW 
Calculation of consumer price index number for year 2011 


702. 
370246.80 _1 49.199 
2605 


Current year’s consmer price index number = 
Ans. : Consumer price index number for year 2011 is 
approximately 142.13. 


Illustration 2. 
Construct consumer price index number from the following data : 
Class Index No. Weight 
Foodgrain 150 15 
House Rent 100 12 
Fuel and Light 110 9 
Clothes 120 10 
Miscellaneous 90 14 
Solution: 


Class Index No. ( / ) Weight ( W ) Weight Relative W | 
Food 150 15 2250 

Rent 100 12 1200 

Fuel and Light 110 9 990 

Clothes 120 10 1200 

Miscellaneous 90 14 1260 

Total > W= 60 2 W/ = 6900 

Consumer's price Index Number = =" “a 


Illustration 3. 
Construct weighted price index number for 2009 and 2010 from the following 
data by taking 2008 as base : 


115 


Commodity Weight Price per unit 
2008 (* ) 2009 (* ) 2010 (~ ) 
A 4 20 21 24 


B3 1.25 1 1.50 

C2582 

D122 3.25 
Solution: 


Construction of cost of living index number by family budget method or weighted 
index number : 


Base year 2008 Base year 2009 Base year 2010 
Comm- Weight Price Price Price Price Weight Price Price Weight 
odity Relative Relative Relative Relative Relative 
(W)(R7)( WR, )(Ro)( WR) 


A 4 20 100 21 105 420 24 120 480 

B 3 1.25 100 1 80 240 1.5 120 360 

C 25 100 8 160 320 2 40 80 

D 12 100 2 100 100 3.25 162.5 162.5 


2W2WR 472 WR 
= 10 = 1080 = 1082.5 
Index Number for year 2009 = SW a yee=i08 
Index Numbr for 2010 = =" ~w = 108.25 
Illustration 4. 


The following information were obtained from the budget-survey of the middle 
class families of a city : 


Expenses Food Clothing Rent Fuel and Elec. Misc. 
40% 20% 10% 10% 20% 


Price 2009 150 75 30 25 40 
Price 2010 162 87 60 30 50 


What changes in the cost of living figure of 2010 have taken place as to 2009 ? 
Solution: 


Item Price in Price in Price Relative Weight Weight 
2009 2010 ( R)( W ) Relative ( WR ) 
Food 150 162 108 40 4320 


Clothing 75 87 116 20 2320 

Rent 30 60 200 10 2000 

Fuel and Elec. 25 30 120 10 1200 
Misc. 40 50 125 20 2500 

2 W=100 2 WR = 12340 


IWR 123- 


Index Number for year 2010 = =” 100 
The increase in 2010 is 23.4% as compared to 2009. 
Illustration 5. 
Find out consumer price index number of 2010 from the following data, taking 
2000 as base by weighted aggregate method : 
Commodity Quantity Unit Price in 2000 Price in 2010 


123.4 


consume in 2000 (in © ) (in * ) 
Wheat 2 Quintal Quintal 400 600 
Rice 1 Quintal Quintal 620 880 
Pulses 24 kg Quintal 1000 2000 
Suger 24 kg Quintal 900 1300 
Salt 10 kg kg 24 

Oil 12 kg kg 40 52 

Cloth 20 metre metre 40 70 

Fuel 10 Cylinder Cylinder 70 105 


House Rent 1 House House 400 1000 
Solution : 


Commodity Quantity ( q 9 ) Unit Price in 2000 Price in 2010 p o 
90P190 

(In° )(po)(IN* )( p74) 

Wheat 2 Quintal Quintal 400 600 800 1200 


Rice 1 Quintal Quintal 620 880 620 880 
Pulses 24 kg Quintal 1000 2000 240 480 


Suger 24 kg Quintal 900 1300 216 312 

Salt 10 kg kg 2 4 20 40 

Oil 12 kg kg 40 52 480 624 

Cloth 20 metre metre 40 70 800 1400 

Fuel 10 Cylinder Cylinder 70 105 700 1050 

House Rent 1 House House 400 1000 400 1000 
Total — — — — 4276 6986 


Pee 2490 986 
Consumer's price index number == 120100 _ 6986 
= Po90 4276 


x100 163.38 


Illustration 6. 

In the following series various group weights and group indices (except that of 
fuel and light) have been given for constructing cost of living index numbers : 

Food Clothing Fuel and Light Rent Miscellaneous 


Group Index No. 220 199 x 184 161 


Group Weights 35 16 20 9 20 

If the cost of living index number is 195, find out the group index of fuel and 
light. 
Solution : 

Group Index No. Weight RW 

RW 

Food 220 35 7700 

Clothing 199 16 3184 

Fuel and Light x 20 20 x 

Rent 184 9 1656 

Misc. 161 20 3220 

100 15760 + 20 x 


Cost of living index number ( P 94 ) = =" 
Substituting the values, "iw 

or 19500 = 15760 + 20 x 

or 19500 — 15760 = 20 x 


or 20 x = 3740 


x = 187 

So group index no. of fuel and light will be 187. 
Illustration 7. 

Find the cost of living index number by the (Family Budget Method) from the 
table given below : 

2009 2010 

Item Quantity Price Price 

Rice 20 kg 1.00 2.00 

Wheat 50 ” 0.60 1.10 

Oil 10” 2.00 4.00 

Ghee 0.5” 8.00 15.00 

Sugar 5” 1.00 1.80 

Clothing 40 metre 2.00 3.75 

House Rent One House 40.00 75.00 

(Indore 2006) 


Solution : Calculation Table 
2009 2010 
Items dq 9 PQ9P1W=poqgoq a RW 
Rice 20 1.00 2.00 20.00 200.00 4,000.00 
Wheat 50 0.60 1.10 30.00 183.33 5,499.90 
Oil 10 2.00 4.00 20.00 200.00 4,000.00 
Ghee 0.5 8.00 15.00 4.00 187.50 750.00 
Sugar 5 1.00 1.80 5.00 180.00 900.00 
Clothing 40 2.00 3.75 80.00 187.50 15,000.00 
House 1 40.00 75.00 40.00 187.50 7,500.00 


Total 2 W= 199.00 2 RW= 37,649.90 


Index No. for 2010 = =" ~ 199.00 


= 139.19 


Illustration 8. 

In the working class consumer price index number in a particular town, the 
weights corresponding to the different groups of items are following : 

Food 55, fuel 15, Clothing 10, Rent 8 and Miscellaneous 12. 


In October 2002, the dearness allowance was fixed by a mill of that town at 
182% of the worker’s wages which fully compensated for the rise in the price of 
food and rent, but did not for anything else. Another mill of the same town paid 
dearness allowance of 46.5%, which compensated rise in fuel and miscellaneous 
groups. It is known that rise in food is double that of fuel and miscellaneous double 
that of rent. Find rise of food, fuel, rent and miscellaneous groups. 


Solution : 
Let the rise in fuel be X. Therefore, the rise in food was 2 X . 
Similarly let the rise in rent be Y . 
. The rise in miscellaneous was 2 Y . 
The first mill compensated fully for the rise in food and rent but not 
for anything else by paying 182% A.A., i.e., ~ 282 to one getting ~ 


100. 
The Index after rise for this mill will, therefore, be : 
Index ( R ) Weight ( W ) Weights x Index ( RW ) 
Food 2X 55 110X 
Fuel 100 15 1500 
Clothing 100 10 1000 
Rent Y 8 8Y 
Miscellaneous 100 12 1200 
Total 100 3700 + 110 X +8 Y 
I. No. -sv - > m-* [Given I.No. = 282] 
or 3700 +110 X + 8 Y = 28200 
or 110 X + 8 Y= 28200 — 3700 = 24500 
or 110 X + 8 Y= 24500 ....(i) 


Similarly the second mill compensated fully for fuel and 


miscellaneous by paying 46.5% A.A, i.e., ~ 146.5 to one getting © 
100. 

. Index for this mill after rise will be : 

Index ( R ) Weights ( W ) Weights x Index ( RW ) 

Food 100 55 5500 


Fuel X 1515 X 
Clothing 100 10 1000 
Rent 100 8 800 


Miscellaneous 2 Y 12 24 Y 
100 7300+ 15 X +24 Y 


TRW _ 3700+110X4 +8Y 


ae io ” [Given I.No. = 146.5] 
or 7300 + 15 X + 24 X = 14650 
. 15 X +24 Y = 14650 — 7300 ....(ii) 
15 X+ 24 Y = 7350 
or Multiply (i) by 3 and we get 
330 X + 24 Y = 73500 
Subtract (ii) from (iii), 
330 X + 24 Y = 73500 
15 X+ 24 X = 7350 
315 X = 66150 . * as” 
Substituting the value of ‘ X ’ in (i), we get 
110 x 210+ 8 Y = 24500 
or 8 Y = 24500- 23100 = 1400 . "5" 
Hence the rises are as follows : 
Food 2 X = 210 x 2 =420 
Fuel X = 210 
Rent Y= 175 
Miscellaneous 2 Y = 175 x 2 = 350. 


Exercise (C) 


1. Prepare consumer price Index No. for 2011 from the following data taking 2010 


210 


base : 
Group Price 


2010 2011 
| 40.0 48 
Il 2.5 3.0 


Il] 10.0 15.0 

IV 4.0 6.0 

Given weightage to these four groups like 5, 4, 3 and 2. 

[ Ans. 124.59] 

2. An enquiry into the budgets of the middle-class families in Bhopal gave the 
following 
informations : 

Food Rent Clothes Fuel Miscellaneous 

Expenditure 40% 20% 15% 10% 15% 

Price (2010) 100 40 60 20 50 

Price (2011) 150 60 75 25 80 

What changes are seen in cost of living data of 2011 in comparison of 2010 ? 


[ Hint : Here assume percentage as weight] 

[ Ans. |. No. = 145.25. Thus in 2010 rate increased to 45.25% in comparison of 
2010] 

3. Construct cost of living index number for 2011 based on 2008 from the 


following data : 
Group Group Index Weight 
Food 152 48 
Fuel and Electricity 110 5 
Clothes 130 15 
House rent 100 12 
Miscellaneous 80 20 


[ Ans. 125.96] 
4. Construct cost of living Index No. 


Item Food Rent Cloth Fuel Miscellaneous 
Expenditure 35% 15% 20% 10% 20% 

Price) (2008) 150 50 100 20 60 

Price) (2011) 174 60 125 25 90 


[ Ans. 126.1] 


5. Construct cost of living Index No. with the help of the following data : 
Price per unit 


Article Quantity consumed Unit Base year Current 


in Base year 2010 year 2011 
Wheat 4 Quintal 50 120 

Rice 1 Quintal 80 200 

Gram 1 Quintal 40 100 

Pulses 2 Quintal 80 200 

Ghee 50 kg 10 20 

Sugar 50 kg 1 3 

Firewood 5 Quintal 10 25 


House rent 1 House 50 100 

[ Ans. 226.1] 

6. Prepare consumers price index number for year 2011 from the figures given 
below taking year 2010 as base : 

Commodity Quantity Price in © (2010) Price in © (2011) 

S 50 400 480 

T 40 25 30 

U 30 100 150 


V 20 40 60 
W 10 200 250 


[ Ans. 124.6] 


7. Construct cost of living index number from the data given below : 
Group Group Index Expenditure % 

Food 550 46 

Cloth 215 10 

Fuel 220 7 

House rent 150 12 

Misc. 275 25 


[ Ans. 376.65] 

8. The cost of living index number based on certain data for food, rent, fuel and 
light, clothing and miscellaneous was calculated as 205. The percentage 
increase in four items based on base year was as follows—Rent 60, Clothing 
210, Fuel and Light 120 and Miscellaneous 130. If weighted for the different 


item are given as follows, find the increase in food—Food 60, Rent 16, Fuel and 
Light 8, Clothing 12, Miscellaneous 4. 

[ Ans. 92.33] 

9. In 2010 for working class people food was setting at an average price of ~ 160 
per quintal, Cloth at ~ 20 per metres, house rent ~ 300 per house and other 
item at ° 100 per unit. By 2011 cost of food rose by * 40 per quintal, house rent 
by ~ 150 per house and other item doubled in price. The working class living 
index for 2011 (with 2010 as base) was 155. By how much the cloth rose in 
price during the period 2000-2012 ? 

[ Ans. ~ 9 per metre] 

10. In the following series various group weights and the group indices (except 


that of fuel light) have been given for constructing cost of living index number : 
Group Food Cloth Fuel and Light Rent Misc. 
Group Index No. 221 198 ? 183 161 
Group Weight 35 14 15 9 20 
If the cost of living index number is 193, find out the group index of fuel and light. 


[ Ans. 171] 

11. A worker living in Mumbai earns * 350 per month. The cost of living Index 
Number for a particular month is given as 136. From the following data find out 
the amount which he spent on house rent and clothing ? 

Group Expenditure Group Index 

Food 140 180 

Clothing ? 150 

House Rent ? 100 

Fuel 56 110 

Miscellaneous 63 80 

[ Hint: (i) Let Expenditure on clothing be X and on house rent be Y . 
Therefore 140 + X + Y + 56 + 63 = 350 
. X+ Y = 350 - 259 = 91 


(ii) Cost of living index number 
(140%180)4+(4 4+150)4+( +100)4+(86x110)4+ (63930) _ 


350 
= 136 x 350 = 25200 + 150 X + 100 Y + 6160 + 5040 
47600 = 150 X + 100 Y = 36400 

150 X + 100 Y = 11200 

In question the values of X and Y are given 91. 


So, X+ Y= 91 
Simultaneous Equation 


150 X +100 Y = 11200 ....(i) 


Multiplying equation (ii) by 100 for finding the value of X , 
150 X + 100 Y = 11200 ....(i) 
100 X +100 Y = 9100 ....(ii) 


50 X = 2100 
X= 42 
X+ Y=90and X =42 . Y=91-42=49] 
[ Ans. : Expenditure on clothing is ~ 42 and expenditure on house rent is ~ 49.] 
12. A shoe-maker in the city of Indore earns ~ 450 per month. The cost of living 
index number for a particular month is given as 140. Using the following data 


find out the amount he spends on food and clothing ? 
Group Food Cloth House rent Fuel and light Misc. 


Exp.(in ~ ) ? ? 100 60 90 
Group index 150 120 150 115 140 


[ Ans. * 150 | * 50] 
13. In working class budget inquiry in two cities Bhopal and Indore, it was found 
in 2010 that an average working class family’s expenditure on food and other 


item was as follows : 
Bhopal Indore 


Food 50% 64% 

Other item 50% 36% 

In 1986 the working class cost of living index stood at 279 for Indore and 265 for 
Bhopal (base year 1980 = 100). It was known that the rise in the price for 
articles consumed by the working class was the same for Bhopal and Indore, 
what was 1986 index for (i) food, and (ii) other item. 


[ Ans. (i) 315, and (ii) 215] 

THEORETICAL QUESTIONS 

Long Answer Type Questions 

1. What is Index Number ? Describe various problems in the construction of 
Index Number. 

2. What is Index Number ? How are they constructed ? Explain the importance of 
weights in the construction of general price index number. 

3. Define index number. Compare the fixed base and chain base methods in the 
construction of index number and describe their merits and demerits. 

4. Distinguish between simple and weighted index number. Explain methods of 
weighted aggregative and weighted average of price relative using arithmetic 
mean. 

5. Describe the fixed base and chain base methods of constructing index 
numbers. Distinguish between them and describe their merits and demerits. 

6. What is Index Number ? Discuss its utility. Describe briefly the procedure for 
constructing an Index Number of whole sale price. 

7. What is cost of living index number ? Explain the difficulties in its construction. 

8. What is cost of living index number ? Construct a cost of living index number 
taking an illusory example. 

9. Explain those condition which an ideal index number satisfies. Show which 
aforesaid conditions do they satisfies by taking example of any two index 


numbers ? 


10. What is cost of living index number ? Which purpose does it satisfies ? How 
will you construct a cost of living index number for teachers community of your 
city ? 

11. “Index Number are economic barometer’. Comment this statement and 
explain the utility of Index Number. (Rewa 2006) 

12. Define Index Number and mention its uses. (Jiwaji 2006) 

13. What is an index number ? Examine the various problems involved in the 
construction of an index number. Discuss in brief the use of an index number. 


(Bilaspur 2005; Indore 2004) 
Short Answer Type Questions 


1. Distinguish between fixed base method and chain base method. 


2. What do you mean by splicing of index number ? Explain with example. 


OBJECTIVE QUESTIONS 
State whether following statements are true or false : 
. Base Shifting is the geometric mean of your index number. 
. Marshall-Edgeworth’s index number satisfies time reversal test. 
. The quantity of base year in Passche’s index number is used as a weight. 


. Splicing of Index Number is done to adjust the increment in price level. 


a fF © ND = 


. The formula of Time reversal Test is P 01 * Q 01 = 1. 


6. Geometric mean is good for the construction of index number. 
7. Fisher’s Index number formula is an ideal formula. 
8. The same result are obtained by chain base index number and fixed base 


index number. 


9 Deflation of Index Number =— 


Fill in the blanks : 
1. Index Number is a ........ type of average. 


2. Index Number is constructed historically for the first time .......... 


3. The best average for the construction of index number is......... 
ree index number is said to be ideal index number. 
= IWO TESTS: (1) ysccsscoseseeisscerass ANG 2) iatdvsctsretevins are suggested by Fisher. 
. Index Number are said to be............. to measure economic variation. 


5 

6 

7. Base year should be...... 

8. Fisher’s ideal index number is........ 
9 


. Quantity index number tells the variation in........ from one period to another 
period. 

10. Family budget method is used to construct........ index number. 

AAs, Grae should be constructed to study the variation in price level of a 


community of the people. 

[Ans. 1. Special, 2. 1764, 3. Geometric mean, 4. Fisher, 5. Time reversal test 
and factor reversal 
test, 6. Economic barometer, 7. Normal, 8. Geometric mean, 9. Quantity, 10. 
Consumer price 


index number, 11. Cost of living] 
Choose the correct answers 


1. The best average for the construction of index number is : 
(a) Median (b) Geometric mean 
(c) Mode (d) Arithmetic mean 


2. An ideal index number satisfies the following test : 


(a) Unit test (b) Time reversal test 
(c) Factor reversal test (d) All of these 


3. Fisher’s ideal index number is : 

a) Mathematical average of Laspeyre’s and Passche’s index numbers 
(b) Median of Laspeyre’s and Passche’s index number 

(c) Geometric mean of Laspeyre’s and Passche’s index number. 

(d) None of these 


4. Time Reversal Test is true when : 


5. The formula of Fisher’s ideal index number is : 


: 5 
Po =P! x 100 Py =2PM x 100 
(a) = Pod (b) > pod 
=pPi'gor gy) Tmo =PaH 
Pe 00 Py= 5 PUL. Lc55 


(c) °°Smiaeas™ (gy "YEmae "Ee 
[Ans. 1. (b), 2. (d), 3. (c), 4. (b), 5. (d) 


Index Number | 
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DIAGRAMMATIC AND 
GRAPHIC REPRESENTATION 
OF DATA 


After collecting the data, they are classified and tabulated so that 
they become understandable. After this, the inferences, obtained 
from data through different statistical methods, e.g., average, 
dispersion, index number etc., are presented in the form of figures 
which are juiceless and tastless for common man. They generally 
can't understand these data because a common man has no 
knowledge about statistical methods. So it is essential that data 
should be presented by visual methods in such way that they should 
become easily understandable for all. The juiceless data can be 
made interesting and meaningful by pictorial presentation. Only by 
inspection the reader can understand the subject matter displayed in 
diagram. There are various methods to present the data through the 
diagram. So sometimes it becomes very difficult, ‘which type of data 
should be represented by which method in which situation’. The 
method, in which data are attractive and comprehensible, is 
selected. The process of making data_ interesting and 
comprehensible through the geometrical figures like bar diagram, 
histogram, pie, circle or pictures is known as_ diagrammatic 
representation. 

Advantages of Diagrammatic Representation 


Following are the main advantages of diagrammatic 
representation of data : 


(1) Attractive and impressive : Diagrams are attractive and they 
make lasting impression on eyes and mind of observer. A common 
man who does not understand juiceless matters of data and 
mathematics, can understand same matters properly through 
diagrams. The use of different colours in the diagrams attracts the 
observer naturally towards it so he observes not only diagrams with 
attention but also gains information from it. 

(2) To make data simple and understandable : Data can be 
understood easily through diagrams and also the facts explained by 
the diagrams can be easily accepted. So complicated data groups 
can be made simple and understandable through diagrams. 

(3) To make comparison possible : The mutual comparison of 
different data can be made more effectively through the diagrams. It 
is very difficult to understand the data by reading but facts can be 
understood only by observing the diagrams and the data presented 
by it can be also compared easily. 

(4) Greater effect : Diagrams remain in the memory for more time 
in comparison with data so their effect pervades more and matter, 
expressed through it, recalls in memory for more time. 

(5) Saving labour and time : Representing data through 
diagrams, time and labour are also saved. It can take hours to 
understand some data while that can be explained within minutes by 
diagrams. 

(6) More information : Diagrams give more informations than the 
data. Diagram shows not only the trend present in the data but also 
the direction of the change in trend can be determined through it. 
Limitations of Diagrammatic Presentation 


Presentation of the data by the diagrams is more useful. It is 

simple to understand and compare the data through the diagram but 
they can't take the place of data. Following are the main limitations 
of diagrammatic representations : (1) Diagrams can’t present various 
informations altogethor, if the effort is made to do so then complicacy 
may happen, in them. 
(2) Real knowledge about data may not be obtained through 
diagrams because diagrams are based on approximate values. (3) 
Two dimensional or three dimensional diagrams are not easy to 
understand. So they may be used at least. (4) Attractive looks may 
be given to the data by diagrams but their looks have no contribution 
in the data analysis. (5) Comparison of diagrams is only possible 
when they are made on the basis of similar attributes. If dissimilar 
attributes of data are presented in the diagram they will produce 
false and illusive comparison. (6) It is essential to have more care at 
the time of interpretation with the help of diagrams because they are 
easy, clear and more attractive. It is easy to fool one with their use. 
(7) Presentation of very small and very big differences is not possible 
by the diagrams for example, if the, salary of two employees are - 
5,000 and * 5,050 respectively it is difficult to differentiate between 
them by observing the diagram and similarly if these data are ~ 100 
and * 10,000 respectively, these can’t be shown by the diagrams. (8) 
After showing the data by diagrams, it is not possible to do the next 
analysis of previous diagrams in changed situations in future. 


GENERAL RULES FOR CONSTRUCTING 
DIAGRAMS 


To make statistical diagrams attractive and effective, the following 
rules should be carried out : 


(l) Attractive and correct : Diagrams should be attractive and 
effective. They should attract the attention of people. But making the 
diagrams attractive its correctness should not be_ sacrificed. 
Fallacious conclusions may be drawn by incorrect diagrams, so the 
diagrams should be made on graph paper. 

(2) Suitable size : Diagram should be neither too big nor too 
small, contrary to it, diagram should be made of adequate size in the 
middle of graph paper. To make the diagram attractive, they should 
be bordered with a bold line from all sides. 

(3) Title : A clear, perfect and brief title should be written at the top 
of each diagram. 

It clears what is being shown by this diagram. 

(4) Suitable scale : Proper scale should be decided before 
making the diagram. There is no universal rule of scale, yet suitable 
scale should be decided seeing the size of graph paper and the 
characteristics of the data. Diagram should be made neither too big 
nor too small. Diagram should be made in such away that all the 
characteristics of the data can be cleared. If two diagrams are to be 
compared, scale of both should be the same. Scale should be 
written near by the title of the diagram. 

(5) To draw diagram : Geometrical tools like protractor, scale, 
compass should be used for making the diagrams. Diagrams should 
be always made with pencil. Later colours should be used. If 
diagrams are to be changed in percentage or parts, it should not be 
written within the diagram. By doing so the diagrams appear ugly. 

(6) Use of marks and colours : Separate marks or colours 
should be used to clear the different types of characteristics of the 
data. 


(7) To border the diagram : Diagrams should be bordered with 
bold lines or double lines for making diagram attractive. 

(8) Index : To explain the marks and colours used in diagrams, 
index should be made in the corner of the diagram. It clears through 
it that those marks or colours are telling which characteristics of the 
data. 

(9) Selection of suitable diagram : There are various types of 
diagram in statistics. So to make diagram in statistics is not so hard 
as to select suitable diagram. These diagrams are used separately 
to present different types of characteristics of data. So suitable 
diagrams should be selected on the basis of nature of data, purpose 
of presentation, ratio of minimum and maximum values and self- 
judgment and experience of investigator. Keeping all these things in 
mind, statistical data should be presented in the form of diagrams or 
graphs. 

KIND OF DIAGRAMS 

Generally the following types of diagrams are used in statistics : 

(1) One dimensional diagrams : Mainly these are bar diagrams 

(2) Two dimensional diagrams : Rectangle, square and circle 
are included under it. 

(3) Three dimensional diagrams : Cube, cylinder diagram 
belong to it. These are also 


said to be volume diagrams. 

(4) Pictograms (5) Chartograms or Map Diagrams (6) Business 
chart 

Here only one dimensional, two dimensional and_ three 
dimensional are explained in details. 


ONE-DIMENSIONAL DIAGRAMS 


Only one range, that is, height is used in one dimensional 
diagram. These diagrams are in form of lines or bar diagrams. Bar 
diagrams do have the width but due to the same width in all the 
diagrams they have no separate effect. Bar diagrams are used when 
the difference between maximum and minimum values is not big. 


Bar diagrams may be vertical or horizontal. 
Bar diagrams may be of following types : 


1. Simple Bar Diagram 

The bar diagrams having lengths proportional to the values of 
different items are drawn under simple bar diagram. The widths of 
these bar diagrams are the same and are drawn at equal distances. 
To make simple bar diagram the suitable scale is decided by keeping 
in mind the highest value of the item. Then all bars are drawn on the 
basis of this scale. Colours are also used to make them attractive. 
These bar diagrams may be drawn vertically or horizontally. 
Generally vertical bar diagrams are more used. 

Vertical bars : When bars core made straight vertically, they are 
known as ‘vertical bars. As possible efforts should be made to make 
the highest bar in the left side and the smallest bar in the right side 
by making bars proportional to the values they represent. This 
sequence may be opposite. But when data are given in order of time 
or other important order then bars should be made in the same 
order. It is made on the basis of scale. 


Illustration 1. 
Draw a simple bar diagram to represent the following data relating to the area 
under different crops in India. 
Crops Area (million acre) Solution : Area under different crops in India 
Rice 40 
Wheat 35 


Cotton 32 

Jowar 30 

Oilseeds 22 

Fodder crops 18 
Illustration 2. 


Represet diagrammatically the following infact mortality in different towns : 
Mumbai 274, Nagpur 323, Paris 93, Kolkata 244, London 66, Oslo 23, Chennai 


251 and Berlin 92. 
Solution : \nfant Mortality Rates in Different Towns 


S50 


Scale 4 cm=50 Children 


T 
Nagpur’ 


hiurrbai 


Kolkata 


Chennai 


Mortality Rate 


o| [223] | 274 | 251 


244 


Horizontal bars : When bars are made not vertically but they are 
made straight horizontally, they are said to be ‘Horizontal Bars’. The 
line of measurement is taken in the above side in it. At the time of 
making such bars the biggest bar is taken in the above side and the 
smallest bar in the lowest side in the same order by making bars 
proportional to the values they represent. But when date are given in 
order of time or in other important order then bars should be made in 
the same order. It is decide according to the questions. 


Illustration 3. 
Represent the following data by horizontal bar diagram : 


Items Exp. (in © ) Solution : To make horizontal bar, vertical line is 


assumed 


Food 300 as base line and scale is taken on the horizontal line. 
Clothing 250 

Rent 200 

Education 150 

Misc. 100 

Saving 50 


Uses of bar diagram : These diagrams are suitable under the 
following cases : 

(1) Individual series; 

(2) Time series; 

(3) Spatial or Geographical series. 

Limitations of bar diagram : Bar diagrams should not be used in 
discrete series. 


Broken Bar Diagram 

Whenever bar diagrams either vertical or horizontal are made, 
these bars begin with zero measurement, but sometimes most 
values out of given values are small while some values are large, in 
that case if the scale is decided by taking into consideration the large 
value then very small bars are made for small values. Due to it bar 
diagram does not appear attractive and suitable, on the other hand, 
if scale is decided on the basis of small values then large value may 
go out from the paper size. To avoid this difficulty, broken bar 
diagrams are used. Keeping in mind small values, the scale is 
decided but big value is started from the zero and broken in the 
middle and is made larger than other bars. 


Illustration 4. 
Represent the following data diagrammatically : 


Persons: ABCDEF 
Monthly Income (in * ) : 400 450 350 500 600 4,000 


Solution : Monthly Income of Some Persons 


74000 


700 Seale: 1 crn=% 100 


2 


7600 


7450 


7400 


a0 


Person 
2. Deviation/Bialateral or Duo-directional Bar 
Diagram 
Two facts of opposite characteristics are shown by such type bar diagram. 
Vertical bars are made above the base line and below the base line. At the time of 
making horizontal bare, the bars are made left and right to the base line. The zero 


line is always assumed in the middle in each case. In such type of bar diagram, 
bars are shown on the both sides of the base. 


Use of this diagram : Bialateral or due-directional bar diagram is useful in 
those conditions in which facts having opposite characteristics are to be shown 
and are to be compared. These bars are really very advantageous because they 
compare separate facts as well as both components of the facts. 

Illustration 5. 

Represent the Reserve Bank of India’s net credit (in crore * ) to State 
Government for various years by duo-directional bar diagram : 

Year 2005-06 2006-07 2007-08 2008-09 2009-10 2010-11 

Net Credit 112 —81 60 100 — 21 106 


Solution : Net R.B.I. Credits to State Governments 


2010-11 


Above can be also shown by horizontal bars. 


3. Subdivided Bar Diagram 


When the aggregate of the data is to be presented along with their 
different parts or components, subdividend bar diagram is used. This 
is also said to be component bar diagram. These different 
components tell their ratio with aggregate and are comparable with 
each other. These different components are presented by separate 
shades. 


Illustration 6. 

Represent the area covered under the high yielding varieties by subdivided bar 
diagram : 

Commodity Million hectare 

2009-10 2010-11 

Wheat 7.9 10.2 

Rice 7.4 8.6 

Millets 2.9 3.7 


Total 18.2 22.5 
Solution : 


Commodity Million hectare 
2009-10 2010-11 

Value Cumulative Value Cumulative 
Wheat 7.9 7.9 10.2 10.2 


Rice 7.4 15.3 8.6 18.8 
Millets 2.9 18.2 3.7 22.5 


Total 18.2 — 22.5 — 


Scale: 1 an =4 Million hectare Index 


VICE 


Production (Million hectare) 


2009-10 Year 2010-11 


4. Percentage Subdivided Bar Diagram 

Assuming the data of a fact as a total, their parts are shown in the 
form of percentage in this bar diagram. All bars have same length 
and width. There are differences in the percentage of their parts. 
This is used to show the relative changes in different divisions. 
Assuming total as 100 their all parts are expressed into percentage 
for constructing it. After it, cumulative percentage are calculated from 
these percentage. At last, the height of the bar is taken as equal to 
100. percent by taking proper scale (generally 1 cm = 10 percent). 
Then the parts equal to cumulative percentage of different divisions 
are divided from the base line. Separate divisions are shaded by 
different marks. 


Illustration 7. 

Represent the expenditure of a person on differ itmes by subdivided bar 
diagram on percentage basis : 

Items Food Clothing House Edu. Fuel Misc. Saving 


Exp. (in ’ ) 240 160 120 80 40 80 90 

Solution : Total Expenses = 240 + 160 + 120 + 80 + 40 + 80 = 800 
Items Expenses Percentage Exp. Cumulative Percentage 
Food 240 sw 30 


160_ 


Clothing 160 «0? 50 

House 120 s!""" 65 

Education 80 i" 75 

Fuel 40 sw-* 80 

Other 80. s0"-" 90 

Savings 80 «0 ”-” 100 

Percentage sub-divided bar diagram is as follows : 
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Illustration 8. 
Show by suitable diagram the following information : 


Crop Area under different crops (’000 hectare) 
2010-11 2011-12 


Wheat 8.0 11.0 
Rice 7.5 9.5 
Sugarcane 3.0 4.5 


Total 18.5 25.0 
Solution : 

Crop Percent Cum. percent (2010-11) Percent Cum. percent 
(2011-12) 


Wheat 8.0 43.2 43.2 11.0 44.0 44.0 
Rice 7.5 40.6 83.8 9.5 38.0 82.0 
Sugarcane 3.0 16.2 100.0 4.5 18.0 100.0 


Total 18.5 100.0 — 25.0 100.0 — 


Illustration 9. 
Represent the following data by sub-divided bar diagram drawn on percentage 
basis : 


Items Monthly expenditure of three families 
ABC 

Food 360 300 200 

Clothing 162 140 120 

House Rent 126 120 100 

Fuel and Light 72 60 40 

Misc. 180 130 140 


Solution : 
Change all components into percentages on the basis of their total 
of each family : 
Family 
Items ABC 
Price % Cum. Price % Cum. Price % Cum. 
% Vo % 
Food 360 40 40 300 40 40 200 33 33 
Clothing 162 18 58 140 19 59 120 20 53 
House Rent 126 14 72 120 16 75 100 17 70 
Fuel and Light 72 8 80 60 8 83 40 7 77 
Misc. 180 20 100 130 17 100 140 23 100 


Total 900 100 — 750 100 — 600 100 — 


Note : The percentages are calculated to the nearest integer. 
Index 


= 


House Fert 


Illustration 10. 
Represent the following by sub-divided bars drawn on percentage basis : 
Cost, Proceeds, Profit or Loss per Chair during 1990, 2000 and 2010 
Particulars 1990 2000 2010 
Cost per chair : 
(a) Wages 4.5 7.5 10.5 
(b) Other cost 3.0 5.1 7.0 
(c) Polishing 1.5 2.4 3.5 


Total cost (° ) 9.0 15.0 21.0 
Proceeds per chair 10.0 15.0 20.0 
Profit (+) Loss (—) + 1.0 0.0 — 1.0 


Solution : 

Assuming selling price per chair as 100, all figures will be changed 
into percentages and then noting the scales, required percentage 
sub-divided bar diagram will be made. 

1990 2000 2010 


Particular Price % Cum. Price % Cum. Price % Cum. 
% % % 

Wages 4.5 45 45 7.5 50 50 10.5 52.5 52.5 

Other cost 3.0 30 75 5.1 34 84 7.0 35 87.5 


Polishing 1.5 15 90 2.4 16 100 3.5 17.5 105.0 

Total cost 9.0 90 — 15.0 100 — 21.0 105.0 

Profit/Loss 1.0 10 10 ——-— -1.0-5.0-5 

Sale value per chair 10.0 100 100 15.0 100 100 20.0 100.0 100.0 


hdex 
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Percentage 


Seale > 1 om = 20 percent 


1990 


5. Compound or Multiple Bar Diagram 

To represent the comparative situation of different attributes of the 
data multiple bar diagrams are made side by side (on the basis of 
scale). These bars can be represented by different colours or by 
different shades according to different characteristics. 

Such type of diagrams may be of different kinds : 


(i) Double bar diagram : Here two bars are made side by side to 
present two attributes or two times. 
Illustration 11. 

The data given below, show the profits (in thousand rupees) of two companies 
A and B. Represent this data by means of a multiple bar diagram : 

Year Profit ( ~ in thousand) 

Company 

AB 

2007-08 120 90 

2008-09 135 95 


2009-10 140 108 
2010-11 160 120 


2011-12 175 130 
Solution : 


Yeary Prot in O00 Rupees 


200 7 ree) Company A 


Seale :1 cm = 40 Thousand 


2007-08 2003-09 2009-10 2010-11 2011-12 


(ii) Treble bar diagram : Here three bars are made side by side 
to present three attributes or three conditions of same attribute or 
three times. Scale is necessary. 

Illustration 12. 


Show by a treble bar diagram the basic data relating to major foodgrains : 
Production million tonne 

Years Rice Wheat Coarse grains 

2009-10 40.43 20.09 27.29 


2010-11 42.23 23.83 30.54 
2011-12 42.74 26.47 24.39 


Solution : 


Seak :1¢m =5 Million tore index 
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Illustration 13. 
Draw a multiple bar diagram to represent the following data : 


Year Scale ( — 000) Gross Profit ( ~ 000) Net Profit ( ~ 000) 


2008 100 30 10 
2009 120 40 15 
2010 130 45 25 
2011 150 50 25 


Solution : 


s Seale : 1 cm =? 25 thousand 150 
150 - 
= Index 
mw 190 Sale 
qn L fen 
o ss fae] Gross Protit 
=) 100 Net Profit 
* 4100 
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= 76 
o 
$0 
26 
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(iii) Multiple bar diagram : The separate bar is made side by side 
for each attribute or condition to represent more than three attributes 
or more than three forms or conditions of the one attribute through 
this diagram. For example, suppose in previous example (12) that 


other food is to be represented then one next bar will be made side 
by side for that. 


Illustration 14. 
Represent the following data by means of multiple bars : 


Faculty No. of Students 


2009-10 2010-11 2011-12 
Arts 1,200 1,100 900 
Science 800 900 1,200 
Commerce 400 500 350 
Law 300 400 550 


Solution : 


Stak :1¢0m = 200 ste sts 
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Exercise 14 (A) 

Simple Bar Diagram 

1. Prepare bar diagram from the data given below related with the strength of Iraq 


in the Gulf area : 
Tank 5,900 
Machine gun 4,100 
Fighterg Plane 1,100 
Missile 800 
Lanuchers 450 


2. Draw a horizontal bar diagram from the following data related with the distance 
covered by six bus routes in one day : 
Route No.: ABCDEF 


Distance Covered (km) : 50 60 65 80 90 100 
3. The following table given the data related with the production of rice per hectare 


in India. Represent these by bar diagram : 
Year : 2006-07 2007-08 2008-09 2009-10 2010-11 2011-12 
Producting (kg) : 688 1,013 1,123 1,235 1,552 1,482 


4. Represent diagrammatically the monthly income (in © ) of some presons : 


Persons: ABCDDF 
Income : 500 400 450 800 600 5,000 


5. Represent the data related with population of India by simple bar diagram : 
Year : 1901 1911 1921 1931 1941 1951 1961 1971 1981 1991 2001 
Population (in crore) : 23.6 25.2 25.1 27.9 31.9 36.1 43.9 54.8 68.3 84.5 102.7 
Duo-Directional Bar Diagram 


6. Represent the following data by a suitable bar diagram : 
(Increase in the production in cotton industry) 

(Increase in the production in iron-steel industry) 
Decrease in sugar production 40% 

Decrease in cement production 30% 


7. Represent the following data pertaining to Indian Railway by suitable bar 
diagram : 


Particulars 2009-10 2010-11 2011-12 
Gross Income 390 422 468 

Gross Expenditrue 331 353 389 

Net Income 59 69 79 


8. Draw a duo-directional bar diagram from the following data : 
Profit of Indian Railway (in crore rupees) 
Particulars Net Profit Expenses Gross Profit 
2007 628 
2008538 
2009 325 
2010639 
2011426 
Sub divided Bar Diagram/Percentage Sub divided Bar Diagram 


9. Represent the following data relating to the expenditrue under different heads 
during second Five Year Plan in a State in India by subdivided bar diagram : 


Head Expenditure (Rupees in crore) 
Industry 127.0 

Irrigation 92.5 

Agriculture 100.0 

Transport and Roads 92.5 
Miscellaneous 68.0 


Total 480.0 


10. Represent the following data by a subdivided bar diagram : 
Number of Students in Various Faculties 
Arts Science Commerce Agriculture Total 


College A 80 120 40 60 300 
College B 50 75 45 30 200 (Gorakhpur 2004) 


11. Draw subdivided bar diagram from the following data : 
Article Expenditure in rupees 


Family A Family B 
Food 300 500 

Rent 200 350 

Cloth 125 250 
Education 110 225 
Miscellaneous 75 125 
Saving 90 150 


12. Prepare a suitable bar diagram for following information : 
Expenditure in rupees 


Items Family A Family B 
Food 1,200 1,500 

Cloth 800 1,200 

Rent 600 1,200 

Education 600 900 

Others 800 1,200 


13. Represent the following data by a subdivided bar diagram : 


Head Commodity 


AB 

Selling Price per unit * 3° 2 

Quantity Sold 75 Units 100 Units 

Value of Raw Material ~ 175 * 150 

Other Production Exp. © 30 ~ 25 

Profit * 20 * 25 

14. Represent the following data by a subdivided bar diagram : 


Items of Expenditure Family A Family B Family C 
Food 50 45 60 

Cloth 20 25 20 

Rent 10 10 10 

Education 5 10 5 

Misc. 15 10 5 


Total 100 100 100 


15. Prepare a subdivided bar diagram or percentage subdivided bar diagram for 
the data below pertaining to the number of student in a college : 


Faculty 2009 2010 2011 
Arts 200 150 300 
Commerce 250 350 500 
Science 300 400 400 


Total 750 900 1,200 


16. Represent the following by subdivided bar diagram : 


Cost Sale Price per Heater (in rupees) 
2010 2011 

Cost per hecter 30 50 

Wages 15 20 

Raw Material 10 12 

Other Expenses 13 13 

Profit (+) or Loss (-)-5 +5 


17. A furniture manufacturer supplies you the following data. Represent the same 


by a suitable diagram showing the amount of profit or loss per article : 
Cost (in © ) 


Heads Chair Easy Chair Table 
Materials 10 15 20 

Labour 8 18 20 

Other Expenses 2 5 10 

Total Cost 20 38 50 

Selling Price 25 40 45 

Multiple Bar Diagram 


18. Represent the following data by double bar diagram : 
Year : 2007 2008 2009 2010 2011 


Export (crore * ) : 73 80 85 80 76 
Import (crore © ) : 70 72 74 85 90 
19. Draw multiple bar diagram for the following data : 


Year Export (in crore © ) Import (in crore ° ) 
2008-09 610.36 624.65 
2009-10 955.39 742.78 
2010-11 660.65 578.36 
2011-12 565.22 527.98 


20. Following data relates to the number of students in four faculties of a college. 


Construct a multiple bar diagram : 
Faculty Number of Students 
2008-09 2009-10 2010-11 2011-12 
Arts 800 750 650 700 
Science 700 780 850 850 
Commerce 600 750 950 1,200 
Home Science 250 220 200 150 


21. Represent the following by multiple bar diagram : 
Result Number of Students 

2008 2009 2010 2011 

First Div. 10 14 12 22 

Second Div. 20 28 24 20 

Third Div. 30 40 50 45 

Failure 20 25 30 35 


TWO DIMENSIONAL DIAGRAMS 


Data are represented only by length in bar diagrams but the data 
are shown by height and width both in two dimensional diagrams. So 
the areas in these diagrams are proportional to item-values. That is 
why they are said to be surface diagram. These two dimensional 
diagrams are of the following type : (1) Square diagram, (2) Circular or 
Pie diagram, (3) Rectangular Diagram. 

(1) Square Diagram 

When the difference between minimum and maximum values is 
more then bar diagrams are not constructed. For example larger bar 
will be 36 times more than smaller bar for two values 100 and 3600. 
In such situation, square diagram is used. For making square, the 


square root of values are calculated and square diagrams are 
constructed on the basis of a fixed measure. 


Illustration 1. 
Represent the following data in the form of square diagram : 


Item Tea Coffee 

Consumption (in kg) 100 16 
Solution : 

Here bar diagram will not be suitable because one will be very big 
and other will be very small so it will be difficult to compare. Hence 
we will determine square root of these numbers which will be 10 and 
4 respectively, i.e. , the ratio of sides of their squares will be 5 and 2. 
We will construct two squares by assuming sides 5 cm and 2 cm in 
the following way. 

To determine scale in square diagram is necessary. The area of 
square is equal to the square of side. In above example side of 
square representing 100 will be 5 and so its area will be 5 2 cm = 25 


cm 2 


Seale :1 om =4or1em=2 


. 25 square cm represents 100 
-. 1 square cm represents = = 4 
-. 1 square cm = 4 

Illustration 2. 


Represent the following information by square diagram : 
Fourth Five Year Plan Outlays of Some States 


State : UP Andhra Pradesh Rajasthan Nagaland 
Amount (Rupees in crore) : 1,023 421 300 42 
Solution : 
State Rupees Square Root Length of side of square 
(in crore) 1 cm = 10cm 2 
UP 1,023 31.98 3.198 
Andhra Pradesh 421 20.52 2.051 
Rajasthan 300 17.32 1.732 
Nagaland 42 6.48 0.648 


U.P. 


Rectangular Diagram 


The quantities are also compared by the area of rectangles. But 
these are used in those situations in which two characteristics of the 


data are to be compared simultaneously. 

(a) Simple (b) Divided (c) Percentage Subdivided 

(a) Simple rectangular diagram : Two attributes of data are 
compared simultaneously in these diagrams. If average wage and 
number of labours of a factory is multipliced whole wage of everyday 
will be determined. In the same way if we assume one side of 
rectangle as average wage of per day and second size as the 
number of labours then area of rectangle will represent the whole 
wage of one day. 
Illustration 3. 

Suppose the average daily wages per labourer in factory A is © 5 and there are 
in all 200 labours. The average daily wages per labourers in factory B is ° 4 and 


there are in all 300 labourers. Express this information by rectangular diagram. 


Solution : 


Scale: 1 com = 50 labour 
1om=%2 


Total 


daily 


Labour 


wage 
(Factory B) 
Total daily wage 
(Factory 4) 


Wage 


(b) Divided rectangular diagram : These are similar to divided 
bar diagrams. The difference is only that their width is proportional to 
any value of the variable. Thus both length and width of such type of 
diagrams are taken according to measurements. These diagram are 
used to represent three different but mutually related facts. 


Illustration 4. 
Represent the following data by means of subdivided rectangular diagram : 


Head Commodity 

AB 

Selling Price Unit * 2° 3 

Quantity Sold 40 Units 20 Units 

Value of Raw material * 26 * 24 

Other Production Exp. * 32° 21 

Profit ° 22° 15 
Solution : 

AB Scale 

Height of the rectangle (Selling price per unit)* 2° 31cm=~° .5 

Width of the rectangle (Quantity Sold) 40 units 24 units 1 cm = 20 
units 

Area of the rectangle (Total Sales) 40 x 2 =~ 80 20 x 3 =~ 60 1 
cm 25: 10 
To draw the subdivided rectangles, calculate the following : 

AB 


Item Per Unit Cumulative Per Unit Cumulative 
Raw material » = * 0.650.65 » =*° 1.20 1.20 
Other Exp. » =° 0.80 1.45 » = 1.052.25 

Profit «© =* 0.55 2.00 » =~* 0.75 3.00 

Total (Sale value) ~ 2.00 > 3.00 

Required rectangular diagram will be as follows : 
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Illustration 5. 
Represent the following data by a subdivided rectangular diagram : 
Commodity 
al 
Price per unit of commodity rupees 10 12 
Quantity sold 20 24 
Value of raw materials used rupees 100 120 
Other expenses of prduction rupees 60 96 
Profit 40 72 


Solution : 
Item Per Unit | Cumulative | Per Unit Il Cumulative II 
Raw Material » =" 55» ="55 
Other Exp. » =" 38%=°49 
Profit » =* 210» =* 312 
Total (Sale Amount) * 10° 12 


Index: 


CU Profit 
Other expenses 


SSFSTTS : 
fe) Raw material 


Cost Profit ete. per unitfin rupees) 


cphx thd lp k 

(c) Percentage subdivided reactangular diagram : If two or 
more than two quantities are to be compared and they are to be 
subdivided then rectangles are used for simplicity of the comparison 
and each component is represented by the percentage, taking the 
total as 100. In such situations the width of the rectangle is 
proportional to the quantities and the height of all remains the same 
and that represents 100. Such diagrams are used to compare family 
budget data of different families. 


Illustration 6. 
Represent the following data by percentage subdivided rectangular diagram, 
mentioning the steps you will in the construction : 


Faculty No. of Students 


College A College B 


Engineering 440 840 
Business Management 320 420 
Agriculture 240 140 


Total 1,000 1,400 
Solution : 

Step : 1. Calculate percentage and cumulative percentage as 
shown in the calculation table. 2. Fix the width of the base of the 
rectangle (in the ratio of the total students 1000 : 1400 /.e. , 5: 7. 3. 


Construct two rectangles (for college A and B), taking height 100 
with suitable scale. 4. Divide each rectangle into the faculties on the 
basis of percentage and cumulative percentage. 5. Distinguish the 


different parts by different crossings or colours. 
Calculation Table of Percentage and Cumulative Percentage 


College A College B 


Faculty Number % Cumulative % Number % Cumulative % 


Engineering 440 44 44 840 60 60 
Business Management 320 32 76 420 30 90 
Agriculture 240 24 100 140 10 100 


Total 1,000 100 1, 400 100 
The required diagram is as follows : 
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Illustration 7. 


Horteoa tl: 1 cm = 1,000 Stideit 


Show the details of monthly expenditure of two families given below by means 


of two-dimensional diagram : Family 


Items of Expenditures A () B () 


Food 120 160 
Clothing 80 100 
House Rent 60 120 
Education 40 80 


Fuel 20 40 
Misc. 40 60 
Income 400 600 


Solution : 
Width of the rectangles are taken in the ratio of 400 : 600 


Items of Expenditure Family A (* 400) Family B ( ~* 600) 


~% Cum. % ~*~ % Cum. % 
Food 120 30 30 160 26.7 26.7 
Clothing 80 20 50 100 16.7 43.4 
House Rent 60 15 65 120 20.0 63.4 
Education 40 10 75 80 13.3 76.7 
Fuel 20 5 80 40 6.7 83.4 
Misc. 40 10 90 60 10.0 93.4 
Saving 40 10 100 40 6.6 100.0 

400 100 600 100 


100 


Expenditure (in percentage) 


Rs. 400 Rs. 600 
Family & Family 8 


(2) Circular or Pie Diagram 


For comparative study of data circles are also used. Pie-diagrams 
are used in those situations where square diagrams are used. Since 
area of circle depends on radius so circles are made by taking radius 
proportional to sides of square, Pay attention that the centre of 


circles should be on the same straight line. 


To make circle instead of squares is good in two aspects. Firstly it 
is easy to make them and they look beautiful and attractive. 
Secondary, division of data can be shown in circle by dividing. In 
sub-divided pie-diagram, the different angles are calculated for 
different items by assuming total as 360°. At last, circles are divided 
on the basis of these angles. 


Illustration 8. 
Represent the following by circular diagram : 


Commodity Production (lakh ton) 
Rice 427.35 
Wheat 264.77 
Jowar 77.53 
Bajra 53.57 


Solution : 
Commodity Production lakh ton Square Root Radius of 


Ratio of Radii the circle (in cm) 
Rice 427.35 20.67 2.067 = 2.0 
Wheat 264.77 16.27 1.627 = 1.6 
Jowar 77.53 8.805 .8805 = 0.9 
Bajra 53.57 7.32 .732 = 0.7 
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To Determine Sector in Pie Diagram : To determine scale of pie 
diagram we have to calculate the area of the circle and area is TT r 2 
, where, 1 = 7 , r=radius of the circle. 

After determining the area of a circle, we will find the value for 1 


cm 2 . This will be scale. 


For example, if radius of a circle is 2 cm and * 1760 is represented 
in it then scale will be calculated as follows : 


? x (2) 2 or 7 x 4or > of 12.57 cm * 
~ = om 2> 1,760. 
“ 1 cm 2 = net = Zs 140 


Scale :1.cm 2 =* 140 (1. sqcem=~ 140) 


Illustration 9. 

Draw a pie diagram to represent the following data showing the units of 
electricity sold by different classes of consumers during a month by an electricity 
suppling company : 


Consumers Class Units Sold 

Motive Power 56,000 

Light and Fan 29,000 

Domestic 13,000 

Street Lighting 2,000 
Solution : 


Calculation of angles of subdivided pie-diagram : 
The total number of units dold = 1,00,000 = 360° 


». 1,000 sold units = 00 = 3.6° 

-. 56,000 — 3.6° x 56 = 201.6° - 202° 
29,000 — 3.6° x 29 = 104.4° - 104° 
13,000 — 3.6° x 13 = 46.8° - 47° 


2,000 — 3.6° x 2=7.2° ~ 7° 
Required pie-diagram is as follows : 
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Illustration 10. 

Represent the data of cost of different items in building a house in Delhi State 
by pie-diagram : 

Item : Labour Bricks Cement Steel Wood Supervision 

Expenditure % :25 15 20151015 
Solution : 

Calculation of corresponding angles : 

100 = 360° -. 1 = tw i.e. , 3.6° 

Items Exp. percent Angle 

Labour 25 3.6 x 25 = 90° 

Bricks 15 3.6 x 15 = 54° 

Cement 20 3.6 x 20 = 72° 

Steel 15 3.6 x 15 = 54° 

Wood 10 3.6 x 10 = 36° 

Supervision 15 3.6 x 15 = 54° 

Total 100 360° 

To Construct Diagram : First of all construct a circle. Draw an 
angle of 90° at the centre of the circle. Then in the anti-clockwise 
draw the angles of 54°, 72°, 54°, 36° and 54° continuously. 
Distinguise each sector by different shades. The required Pie- 
diagram is given below : 


Illustration 11. 
Represent the following data by a pie diagram : 


Item Expenditure (in ~ ) 
Food 180 

Clothing 70 

Rent 80 

Education 24 

Litigation 80 

Fuel 120 

Miscellaneous 46 


Solution : Let Total Expenses = * 600 
-° 600 = 360° > ° 1= ww =0.6° 
So, ’ 180 = 180 x 0.6 = 107° 
~ 70 = 70 x 0.6 = 42° 
~ 80 = 80 x 0.6 = 48° 
~24=24x0.6 = 14.4 - 14° 
~ 80 = 80 x 0.6 = 48° 
~ 120 = 120 x 0.6 = 72° 
~ 46 = 46 x 0.6 = 27.6 ~ 28° 
Making the angles in clockwise direction, the required pie-diagram 

is as follows : 
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IIlustration 12. 


The plan outlays for third and fourth Five Year Plans of India are shown in the 
following table. Represent it through a subdivided circular diagram : 


(Rupees in crore) 
Head Third Plan Fourth Plan 


Agriculture 1,089 2,728 

Irrigation 665 1,087 

Power 1,252 2,448 

Industry and Minerals 1,967 2,448 
Transport and Communications 2,112 3,631 
Social Service and Misc. 1,491 2,771 


Total 8, 576 15, 902 
Solution : 
To Calculate radius of two separate circles for third and fourte plan 


Plan Total Exp. Square root Radius of the circle (in cm) 
Third 8,567 92.6 1.5 
Fourth 15,902 126.1 2.0 
Let 92.6 =1.5cm, 1= us 
or 126.1 = =: x 126.1=2cm 
Computation of angles for different sectors : 
Items Third Plan Fourth Plan 
(in crore) Angle * (in crore) Angle 
Agriculture 1,089 46° 2,728 62° 


Irrigation 665 28° 1,087 25° 
Power 1,252 52° 2,448 55° 
Industry and Minerals 1,967 82° 3,631 82° 


Transport and Communications 2,112 89° 3,237 73° 
Social services and Misc. 1,491 63° 2,771 63° 


Total 8,576 360° 15,902 360° 
Expenditure of Third and Fourth Five Year Plan 
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Third Plan Fourth Plan 
75 476 Crore 7 15,902 Crore 


CARTOGRAMS OR MAP DIAGRAMS 


Statistical matters can be rrepresented very attractively by the map 
diagrams. Population density, distribution of rains, — yields, 
temperature, language, mineral-matters, etc. generally can be 
represented in the map diagrams. They can also be made more 
meaningful and attractive by various colours or shades. Statistical 
map presents numerical information on geographical basis. Separate 
shades or colours are used to present different facts. 


India 
Annual Average Rain 


PICTOGRAM OR IDEOGRAPHS OR ISOTYPE 
METHOD 


In this method, the figures are presented by the pictures of related 
facts, e.g. , population by picture of men, production of milk by milk 


cans, production of perfume by small bottles of perfume. 

Now-a-days, this method has been being very popular. It is also very effective 
because diagrams have quick and lasting effect. The biggest drawback of these 
diagrams is that it is very difficult to make them. One symbol is to be used for a 
definite number. If other number is to be presented instead of that definite number, 
it may be very difficult to make symbols in that ratio. At the time of constructing 
these diagrams, the great care is needed so that symbol may be in the ratio of 
data. These type of diagrams are mostly used in the advertisement. 


Illustration 13. 
The Production of milk of two countries A and B is 11 thousand ton and 6 
thousand ton respectively. Represent this information by pictogram. 


Solution : 

One can of milk is assumed as 2000 ton milk. To show 11 thousand ton milk, 5 
full cans and one half can will be used. To show 6 thousand ton, 3 full cans will be 
made. 


Milk Production of two counties 


courtry A 


courtry Seale: 1 can = 2000 ton 


Illustration 14. 
Represent the population of India and America for 1981 by pictograms : 


India America 
66 crore 27 crore 


Solution : A picture of man = 6 crore persons 
For India « = 11 Pictures For America « = 4.5 Pictures 
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Exercise 14 (B) 


1. Prepare rectangular diagram to represent the following information : 


Factory AB 
Wages in Rupees 3,000 2,000 
Material in Rupees 5,000 3,000 


2. Represent the following data by two dimensional diagram : 
Income 


Family A Family B 


Items of Expenditure ~ 400 per month ~ 800 per month 
Food 150 250 

Clothing 75 200 

House Rent 60 100 

Fuel 40 50 

Misc. 75 200 


3. Represent the following data by means of two-dimensional diagram : 


Particulars Family A Family B 
Income per month rupees 500 800 
Items of Expenditure : 

Food 200 250 

Clothing 100 200 

House Rent 80 100 

Fuel and Light 40 50 
Miscellaneous 80 200 


4. Represent the following data by a suitable (rectangular) diagram : 


Particulars Article (Rupees) 
AB 

Price per unit 20 30 

Units sold 40 20 

Cost of Raw material 260 240 
Wages 320 210 

Profit 220 150 


5. Represent the following data by a suitable (Rectangular) diagram : 
Particulars Cost per unit 


Article | Article Il 
Quantity 10 12 

Total Cost) 100 60 
Raw material 500 480 
Wages 300 168 

Profit 200 72 

Selling Price 1,000 720 


6. Represent the following data by a subdivided rectangular diagram : 
Head Commodity 
AB 


Price per unit ~ 3° 2 
Quantity Sold 75 Units 100 Units 


Value of Raw material © 175 ~ 150 
Other Expenses ~ 30° 25 
Profit ~ 20 © 25 


7. Represent the following data by a subdivided rectangular diagram : 


Cost per ton 2010 2011 
Wages 12.74 7.95 
Material 5.46 4.51 

Other Expenses 0.56 0.50 


Total 18.76 12.96 

Selling price per ton 19.91 12.16 

Profit (+) Loss (—) 1.15 —0.80 

Pie-Diagram 

8. Represent the following information by simple circular diagram. 

Country Area (in crore acre) 

A 590.4 

B 320.5 

C 190.5 

D 81.3 

9. Represent the Gross Income of Doordashan (in crore rupees) by circular 
diagram : 

Year : 2005—06 2007-08 2009-10 2011-12 

Income : 60.20 136.30 253.85 300.60 


10. Show the following data by means of a pie diagram : 
Production of three commodities in tons 


ABC Total 
3,260 1,850 900 6,010 


GRAPHICAL REPRESENTATION 


Graphical representation is the second important method of 
representation of statistical data. Graphical presentation is the most 
suitable method to represent time series and frequency distribution 
in effective and attractive form. 

Graphical presentation is done with lines and curves by joining 
different plotted points on the graph paper. Ups and downs and 
curvatures of these lines and curves have more effect on mind. 


Hence in the comparison of tabulation, it is more effective method. 
Characteristics of data can be understood at a glance in it. 

In general, graphical presentation is helpful specially in the 
analysis of following facts : 


Advantages of Graphic Presentation 

(1) Display of time series and frequency distrubiton 
Graphical method is very effective and useful in the presentation of 
times series and frequency distribution. If the data related to general 
price level of 10 or I5 years are given under time series, secular 
trend and short time fluctuations can be easily determined by 


graphical method. 

In the same way, it can be known by graph of frequency 
distribution whether frequency curve is symmetrical or asymmetrical 
(skewed), J-shaped or bell shaped normal distribution. 


(2) Location of positional averages : Median, first and third 
quartiles and mode are sai to be positional average in statistics. The 
value of mode, median, first and third quartiles (Q 4 and Q 9) can 
be located with the help of graphs of frequency distribution. 

(3) Ease in interpolation and forecasting : Intermediate value, 
outside value of statistical series and forecasting can be calculated 
easily by graphical method. If the population data are given from 
1961 to 1991 then the population of 1988 or 2001 can be estimated 
by constructing population curve. . 

(4) Study of correlation : The correlation between two variables 
can be determined with the help of graphs. If the graphs of both 
variables are closed to each other then the correlation between them 
is closed, opposite to it, if the lines of two variables are away then 
there is lack of correlation between them. 


(5) Comparison between multiple figures : Comparison 
between multiple phenomena can be done easily with the help of 
graphs. 


Due to above advantages, graphs are extensively used to 
represent existing statistical data. 


Demerits of Graphic Presentation 

Following demerits are found in graphical presentation : 

(1) A graph shows only trends and fluctuations. It does not tell 
actual value. 

(2) Perfect correctness is not possible in the graph. 

(3) Graph can’t be used as an example to quote in the support of a 
statement. 

(4) Some characteristics of the data can only be shown and 
represented by the graph. All characteristics can’t be represented. 

(5) All graphical representations are not simple and 
comprehensive. Specially ratio measure and multistage data can’t be 
represented by the graph. 


CONSTRUCTION OF GRAPH 


Generally, graphs are constructed by joining together the points 
plotted on graph paper. For this, keeping in mind the characteristics 
of data, any point of intersection of graph paper is assumed as a 
origin or point of origin : 

(1) Vertical and horizontal lines cutting to each other are drawn at 
origin with pencil or ink and made bold so that they appear clearly. 


(2) Vertical line drawn through origin is known as _ ordinate. 
Generally it is indicated by YY’. 
(3) Horizontal line drawn through origin is known as abscissa. 


Abscissa is denoted by Xx’. 


(4) Thus graph paper is divided into four parts. Each part of these 
is known as quadrant. 


(5) Positive points are shown to the right of origin and negative 
points are shown to the left of origin on, X -axis. Thus + 1, + 2, + 3, + 


4 etc. are plotted to the right of origin on X- axis (OX ) and — 1, — 2, 
— 3, — 4, etc. are plotted to the left of origin on X- axis ( OX’ ). 
Similary positive points are shown above the origin ( OY ) and 
negative points are shown below the origin ( OY’) on Y -axis . It is 
clear from the following illustration : 
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Generally the data related to economic and business fields are 
positive, so first quadrant is used to draw the graph in statistics. 
Point of origin is taken in left and below corner of the graph paper. 


GENERAL RULES FOR GRAPHING DATA 

Following are the general rules to construct the graph : 

(1) Title : There should be suitable and perfect title in each graph. 
Information about the data presented in the graph is obtained 
through it. 

(2) Structural framework : Independent variable should be taken 
on the X -axis while dependent variable on the Y -axis in the graph 
construction. The scale of Y -axis starts at origin ‘O’. The point 
should be taken into consideration for plotting the data that there is a 
hard and fast related dependent value for each independent values. 

(3) Choice of scale : Selection of scale is a very important work. 
Scale should be chosen in such a way that whole data should be 
accommodated in the graph paper. There is no rule for it. Only 


proportion of vertical and horizontal scale should be taken into mind. 
Abscissa should be one and half times of ordinate. 

(4) Use of false base line : Vertical scale should be selected in 
such a way that it should start at zero (0) on graph paper. When the 
variation in the value of variable is less in the comparison of its size 
and it is essential to show the desired variation then vertical scale 
should be broken. It is not essential for this that all the values should 
start at origin (O) but needed values are shown according to the aim. 
The part of the data which lies between zero and the minimum value 
omitted, is not shown. For this vertical scale is broken. 

(5) Use of ratio or logarithmic scale : Ratio scale or logarithmic 
scale should be used to show the proportional series. 

(6) Line designs : Presentation of points should be clear in the 
graph. If more than one line is plotted on a graph then it is essential 
to differentiate between the lines. 

(7) Caption : Caption is written below the horizontal line for 
abscissa OX while caption of Y -axis is written in the middle of Y - 
axis. 

(8) Index : Index hould be given in above side to show the scales 
and meaning of various lines. 

(9) Source : If it is possible then the source of data and essential 
notes should be given below. Test of their correctness may be 
possible by doing so. 

(10) Data should be given near curves : Table of data should be 
given along with the graph near the curves. If somebody wants, he 
can study extensively or can test its correctness also. 


TYPES OF GRAPH 


Graphical curves are of two types : 


(1) Graph based on natural scale (2) Graph based on ratio scale 
Under this category, we will discuss the graphs on natural scale 
only. 


(1) Natural Scale 

Under this method scale is natural in the graph paper. Scale 
increases with mathematical rate of increment. For example, if, 1 cm 
= 10 crore population then 2 cm = 20 crore population. There are two 
types of graph on the basis of natural scale : 

(A) Time series graph or historigram (B) Graph of frequency 
distribution 


(A) Time series graph or historigram : In time series graph time 
is independent variable. It is shown on the X -axis in the form of year 
or month, and one or more variables dependent on it are shown on 
Y -axis. A proper scale is used to show all the data on graph paper. 
For example scale on Y -axis : 1 cm = 10 crore population and on X 
-axis : 1 cm = 10 years may be taken. Selection of scale depends on 
size of graph paper and extent of the data. On the basis of natural 


stale, time series graphs are made in two ways : 
(i) Absolute historigram 
(ii) Index historigram 


(i) Absolute historigram : When original data are plotted on the 
graph paper for historigram then it is said to be absolute historigram. 
It is constructed to represent one variable or two or more variables. 

(ii) Index historigram : When, in the place of actual or original 
values, their index or relative values are plotted on the graph paper, 
it is said to be index historigram. 

Types of time series graphs : There are 5 types of time series 
graphs : 

(a) Horizontal line graph 

(b) Band graph 

(c) Silhouette chart or net balance graph 


(d) Range graph 
(e) Z or ZEE chart of Z-curve 
(a) Horizontal line graph : Under it, different types of lines are 


used for variables in the context of time. 


Uses of Graphic Curves 


Graphic curves are used in ways : 
(1) To represent time series, 
(2) To represent frequency distribution 


Graph of time series : The graph, which is constructed to 
represent time series, is known as historigram. In the construction of 
historigram, time (year, month etc) is always plotted on abscissa ( X 
-axis) and values on ordinate ( Y -axis). The graphs of time series 
are constructed on two types of scale : (1) simple scale (2) ratio 
scale. 

Time series graph on natural scale : Under this method, the 
scale in graph is natural. The scale increases in it with mathematical 
increment, e.g. , if 1 cm represents 10 units then 2 cm will represent 
20 units. The graphs of time series on natural scale may be 
constructed in following two ways : 

(i) Absolute historigram : when original data are plotted for 
historigram they are said to be absolute historigram . These may 
be constructed to represent values of one variable or two or more 
than two variables 

(ii) Index historigram : In the place of real or original values 
when their index or relative values are plotted on graph paper, they 
are said to be index historigram . Relative changes are visualised 
through these type of graphs. They are also constructed to represent 
the values of one or more variables. 


ABSOLUTE HISTORIGRAM 


Absolute historigram of one variable : This type of graph is 
constructed on the basis of original values. 


Illustration 1. 
With the help of the figures given below, prepare a suitable graph : 


Solution : 
Trends in Sales of Units 
by Unit Trust of India 


Year Sales of Units 


(July-June) ( * crore) 
2002-03 19.1 
2003-04 2.2 

2004-05 9.2 

2005-06 15.3 
2006-07 17.2 
2007-08 22.8 
2008-09 18.0 
2009-10 15.1 
2010-11 23.0 


Two or more variables and time series graph : Time series 
graphs of two or more variables can be also constructed. If units are 
homogeneous then both quantities are plotted on the same scale in 
vertical measure series. Opposite to it, when quantities are 
heterogeneous then different scales are to be taken on Y -axis which 
is clear from the following example : 


FALSE BASE LINE 


At the time of constructing graph, it is essential to follow the rule of 
false base line that zero should start from the point of origin on the 
vertical scale ( X -axis). This rule is not essential for horizontal scale 
( Y -axis). According to this rule to start from zero on vertical scale, 
sometimes, we face difficulties. For example, if those values which 


are to be started from zero, are very big and the diffierence between 
them is less than we will face the following problems to start from 
zero : 


(1) The curve will be far away from base line and grape-paper 
between the base line will be wasted. 


(2) If values are big but the changes in them are very small /.e. , 
differences between them are very low then they can't be 
represented clearly because if small scale will be taken to show 
them clearly, a big graph paper will be required. 

(3) This type of representation will be unaffective because if big 
scale is assumed then the fluctuations of values will not be shown 
clearly and if small scale is taken then a very big graph paper will be 
required and huge part of that paper will remain wasted. 

To remove these difficulties and make affective graph, false base 
line should be adapted. If possible, false base line should be avoided 
because basic aim of graphical representation is to show the reality. 
The real situation, which is required, may not be shown through it. 
Illustration 2. Solution : 

Prepare a suitable graph from the following data : 


Year Deficit Financing 


( > crore) 

2001-02 114.51 
2002-03 155.14 
2003-04 166.86 
2004-05 171.84 
2005-06 172.26 
2006-07 295.29 
2007-08 206.29 
2008-09 262.38 
2009-10 290.11 
2010-11 229.57 


Illustration 3. 


The following table show India's Foreign Trade (in crore rupees). Show these 
data graphically : 

Year Month Import Export Balance of Trade 

2011 April 143.90 156.00 + 12.11 

May 127.62 153.77 + 26.15 

June 138.26 140.86 + 2.60 

July 148.82 141.37 — 7.45 

August 146.38 171.15 + 24.77 

September 151.67 160.89 + 9.22 

October 138.07 149.85 + 11.78 

November 109.52 166.74 + 57.22 

December 131.92 159.83 + 27.91 

2012 January 174.90 162.12 — 12.78 

February 175.38 145.88 — 29.50 

March 187.45 259.43 + 71.98 
Solution : 

Graph showing Import, Export and Balance of Foreign Trade 
of India, 2011-12 
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GRAPHS OF DOUBLE SCALES 


Sometimes the quantities to be represented have two types of 
units. Such quantities or values are represented with the help of 
double scales. One variable is shown on the ordinate of left side and 
other variable on the ordinate of right side. Both scales are adjusted 
in such a way that there be mutual proportional relationship between 
both variables : 


Illustration 4. 
Represent the following graphically : 
Employment Exchanges Statistics 
Year No. of Exchanges at the end No. of Registration effected 
(Jan._Dec.) of the Year during the year (in lakh) 
1999 296 27.33 
2000 325 32.30 


2001 342 38.45 
2002 353 41.52 


2003 365 38.32 
3004 376 39.58 
2005 396 38.71 
2006 399 39.12 
2007 405 40.39 
2008 416 42.01 
2009 429 45.16 
2010 437 51.31 
2011 441 57.65 


Solution : No. of Employment Exchanges and No. of 
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5 aie 


‘m 


1 div. = 1 year (abe cis a) 
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GRAPHS OF FREQUENCY DISTRIBUTIONS 

Frequency distributions are of two type—discrete frequency 
distribution and grouped frequency distribution. The graphs of 
discrete frequency distribution are of the following types— 

(i) Line frequecny diagram, (ii) Bar frequency diagram, (iii) 
Frequecny polygon, 

(iv) Cumulative frequency polygon or Step-form graph. 

The graphs of grouped frequency distribution are of the following 
types— 

(i) Histogram, (ii) Frequency polygon, (iii) Frequency curve, 

(iv) Cumulative frequency polygon and Ogive. 

Other type of graph : (i) Galton graph, (ii) Lorentz curve. 


LINE FREQUENCY DIAGRAM 


This is the graph of discrete frequency distribution. To construct it, 
values of variables are taken on X -axis and corresponding 
frequencies on Y -axis and vertical lines are drawn on each value of 
variable equal to its frequency by assuming proper scale : 

Note : If vertical lines are not drawn but bars of definite width are 
drawn then this diagram is known as frequency bar diagram. 
Illustration 5. 

Represent the following data graphically : 

No. of Children: 123456 

No. of Families : 46151053 


Solution : 


Discrete series can be represented by two ways (line frequency 
diagram or frequency bar diagram) 


oF 4 5 6 o 12 34 5 6 
No. of Children 


FREQUECNY POLYGON FOR DISCRETE 
FREQUENCY DISTRIBUTION 


The size in discrete series and mid-value in continuous series are taken on 
abscissa ( X -axiS) and corresponding frequencies are taken on oridnate ( Y - 
axis) and then corresponding points are plotted. The graph, which is 
constructed by joining these points, are known as _ frequency 
polygon. 

Illustration 6. 

Represent the following discrete series graphically either by bars on X -axis or 
frequency polygon : 

Height (in cm) 148 149 150 151 152 153 154 155 156 


No. of Persons 5 8 10 15 25 20 16 15 10 
Solution : 


Taking size (here height) on abscissa and frequencies on ordinate 
in discrete series, the frequency bar diagram will be as follows 


25r Scale: Abecesad em=1e¢m 
Ordinate 1 cm=5 Persore 


or 
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Taking size (here height) on abscissa and frequencies on ordinate 
in discrete series, the corresponding points are platted. Joining these 
points by straight line, we obtain frequen polygon as follows : 


20+ f * 
S ioe a j “e 
o “ai 
se 
O | } 1 1 1 1 4 1 1 1 1 
1% 199 #18 151 1652 153 154 155 156 
Height Cin cm) 


This is the graph of grouped frequency distribution. In this diagram a rectangle 
is constructed for each class if class intervals are inclusive, they are converted into 
exclusive form. Thus, the rectangles equal to the number of classes are made side 
by side. Classes are taken on abscissa and frequencies are taken on ordinate. 
The area of each rectangle is proportional to the frequency. 

“Histogram is the group of vertical rectangles in which heights of rectangle are 
taken in the proportion of frequencies.” 

(a) In the case of equal width of classes, the heights of rectangle are taken 
proportional to the frequencies and area of rectangles automatically becomes 
proportional to the frequencies. 


(b) In the case of unequal width of classes, if heights of rectangle are taken in 
the proportion of frequencies then area of rectangles will not remain proportional to 
the frequencies. To remove this demerit, following process will be used : 

We reduce the heights of rectangles in the same ratio in which widths of the 
classes are unequal so that the area of rectangles will remains proportional to the 
frequencies. 


2 Frequency 
Generally, Height of rectange & widorcs 


Cx Frequency 


i.e. , Height of rectange = Withorcr , 

where C is aconstant. 
(i) f C = 1 then Height of rectange = winerci 
(ii) if C = lowest width of C.I. then 


Lowest width of Cl x Frequency 


Height of rectange = ia 
Utility : Mode can be located through histogram. 
Illustration 7. 


Draw a histogram for the following grouped frequency distribution : 
Daily Income : 0—10 10—20 20—30 30—40 40—50 50—60 60—70 


No. of Families : 4 12 20 28 24 18 12 
Solution : 


Scde 
Vertical 1 om = 4 families 
Honzortal : 1 an = 710 
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Illustration 8. 
Represent the following data by Histogram : 
Weight in kg 35—40 40—45 45—50 50—55 55—60 60—65 65—70 


No. of Persons 12 30 22 3018109 
Solution : 


rile i 
"35 40 45 50 55 60 65 70 
weight {in kg.) 


Illustration 9. 
Draw a Histogram on the basis of the following data : 


Mid-value 18 25 32 39 46 53 60 
Frequency 10 15 32 42 26 129 
Solution: 
Since mid-values are given so make the class intervals by finding 
upper limit and lower limit with their help— 
Class 14.5—21.5 21.5-28.5 28.5-35.5 35.5-42.5 42.5-49.5 49.5— 
56.5 56.5—-63.5 


Freq. 10 15 32 42 26129 
Required diagram is as follows : 


A a X-axis % cm = 5 units 
40- scale 


| 10 


Illustration 10. 


Draw a histogram for the following table : 
Class-interval : O—9 10—19 20—29 30—39 40—49 50—59 


Frequency: 8 15 20 30 107 
Solution: 

Here class intervals are in inclusive form. So they can be changed 
into exclusive form as 
0-10, 10-20, 20-30, 30-40, 40-50, 50-60 by assuming that the 
value of variable may not be negative. But if the value of variable 
may be negative then C.|. will be taken as 0.5—9.5, 
9.5=19:5 5.62206 49.5—59.5 Here required histogram for C.1. 0-10, 10-— 
20,...., 0-60 is as follows : 
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Illustration 11. 

Represent the following data by histogram : 

Age (in Years) No. of Persons Age (in Years) No. of Persons 

0—10 5 40—50 35 

10—20 15 50—70 30 

20—30 18 70—100 15 

30—40 22 
Solution : 

Steps : (i) Take the class which has minimum width. Here 
minimum width is 10. So 
C=10A 

(ii) Leaving the frequencies of the classes having width 10, adjust 
the frequencies of the other classes. Thus 


. Frequency 
Height of rectangle ing Withee 


< Frequen 10xFrequency 
i.e., height of rectangle = winercr = waneror 


10x fre 


Age Frequency Width of class Height of rectangle = wanocr 


0O—105105 

10—20 15 10 15 
20—30 18 10 18 
30—40 22 10 22 
40—50 35 10 35 


sed fa 


50—70 30 20 » =15 
70—100 15 30 » =15 
(iii) O X- v{k ij vk;r dh pkSM+kbZ oxZ&lhekvksa osG 
vuqlkj yhft,A 


ssa:1 div.= 10 Years 
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Illustration 12. 
Represent the following data by means of histogram : 


Weekly (in - ) No. of Workers 
100—150 140 
150—200 380 
200—250 540 
250—300 300 
300—400 240 
400—600 240 
600—800 160 


Solution : 
Here the class-intervals are unequal, so the frequencies must be 


adjusted to decide the nelont of the rectangles. 
quency — [ 


Height of rectangle H = witiutuedas 7 


Class-interval Width of class (i ) Frequency (f) H= ° 
100—150 50 140 «-** 


150—200 50 380 »-" 


200—250 50 540 «> 


300 


250—300 50 300 w-*° 
300—400 100 240 iw-*' 


240 


400—600 200 240 20>" 


600—800 200 160 200 78 
Required Histogram is as follows : 
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METHOD OF OBTAINING MODE WITH THE HELP 
OF HISTOGRAM 


(1) The rectangle having maximum height is the rectangle of 
modal class. We draw a line from the top right-hand corner of this 
rectangle to the top right-hand corner of the next adjacent rectangle. 
Similary we join the top left-hand corner of the rectangle having 
maximum height to top left-hand corner of the succeeding rectangle. 

(2) From the point of intersection where there lines cut each other, 
draw a perpendicular on the abscissa. 

(3) This perpendicular line cuts the abscissa at a point, that is the 
mode. That value is decided by reading. 

Illustration 13. 
Represent the following by histogram and also locate the mode : 


Marks 0-10 10—20 20-30 30-40 40-50 50-60 60—70 70-80 80-90 90-100 
No. of Students 8 15 30 36 26 20 18 16 12 10 


Solution : 
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FREQUENCY POLYGON 


Frequency polygon is a graph of discrete and continuous 
frequency distribution. Frequency polygon of a grouped frequency 
distribution may be constructed in two way— 


(a) After Constructing the Histogram : Constructing the 
histogram, find the mid-points of the upper horizontal side of 
rectangle constructed on each class. These mid points are joined by 
the straight line in successive way and then the last two points of this 
diagram are preceded to the base line. 

(b) Without Histogram : Frequency polygon without histogram 
may be constructed in the following way : 

(i) Find mid-point of each class 

(ii) Plot mid-points on the X -axis by assuming proper scale. 

(iii) Plot the figures related to the frequencies on Y -axis by 


assuming proper scale 

(iv) The points plotted against the mid-points and correspoinding 
frequencies are joined by the straight line. 
Illustration 14. 

Draw a frequency polygon for the following data : 


Age (in years) 20—25 25—30 30—35 35—40 40—45 45—50 50—55 55—60 


No. of Persons 1361014952 


Solution : 
Taking the mid-values on the X -axis and frequencies on the Y - 
axis, the required frequency polygon is as follows : 


BE 


1 i 1 i 1 
5 87.5 425 475 525 575 625 


Mid-values 


FREQUENCY CURVE 


Joining the mid-points of histogram not by a straight line but by 
free hand curve we can get a frequency curve. Here effort is made to 
get a smoothed curve. 


Illustration 15. 
Represent the following frequency distributions by a frequency curve : 


Class 0—10 10—20 20—30 30—40 40—50 50—60 60—70 


Frequency 68 10 151383 
Solution: 


Class Mid Point Frequency 
O0O—1056 

10—20 15 8 

20—30 25 10 

30—40 35 15 

40—50 45 13 

50—60 55 8 

60—70 65 3 


Frequency 


Illustration 16. 
Construct Frequency Polygon and Freqnency Curve to show the data given 
below : 


Marks 0—10 10—20 20—30 30—40 40—50 50—60 60—70 70—80 80—90 90 
—100 
No. of Students 6 8 10 14 20 34 18 16 128 


Solution : 
Histogram, Frequency Polygon and Frequency Curve 
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CUMULATIVE FREQUENCY POLYGON 


The graph constructed for grouped frequency distribution on the 
basis of cumulative frequencies is known as cumulative frequency 
polygon, cumulative frequency curve or ogive curve Cumulative 
frequency curves may be constructed in following two ways— 


(a) Less than type cumulative frequency Vurve : In this 
method, the upper limit of the class is plotted on abscissa ( X -axis) 


and the less than cumulative frequency on ordinate ( Y -axis). The 
points, so plotted, are joined by straight line. Since the less than 
cumulative frequencies are obtained in increasing order so polygon 
always rises from bottom to top. 

(b) More than type cumulative frequency curve : In this 
method, the lower limit of the class is plotted on abscissa ( X -axis) 
and the more than cumulative frequency on ordinte ( Y -axis). The 
points, so obtained, are joined by straight line. Since the more than 
cumulative frequencies are obtained in decreasing order so the 
polygon always declines from top to bottom. 

Utility of Ogive 

1 Ogive is a method to represent frequency distributions through 

diagrams. 


2. Median, quartiles, deciles, percentiles, etc. can be located with 
the help of this curve. 


3. If any item-value is taken on X -axis, corresponding less than or 
more than cumulative frequency may be obtained on Y -axis. 


Illustration 17. 
Draw a less than type cumulative frequency polygon for the following data : 


Class 0—5 5—10 10—15 15—20 20—25 25—30 30—35 35—40 
Frequency 28 15 20 16 1063 
Solution : 
Step : (i) Write the less than cumulative frequency and upper limit 


Upper Limit 5 10 15 20 25 30 35 40 

Less than type C.F. 2 10 25 45 61 71 77 80 

(ii) Plot Upper limits on the X -axis and cumulative frequency on Y 
-axis. Join the points, so obtained by straight line (if the curve is to 
be made, join the points by free hand curve.) 


The required cumulative frequency curve is as follows : 
Seale JX axis: 1 om =5 units 
a0 ~ LY axis: 1em=10 


than Cf 


Less 


Illustration 18. 
Give below is the pre-tax monthly income of residents of an industrial town : 


Pre-tax income (~° ) No. of Residents (000) Pre-tax income ( ~° ) No. of 


Residents (000) 

Less than 1,000 5 Less than 5,000 50 
Less than 2,000 20 Less than 6,000 52 
Less than 3,000 35 Less than 7,000 58 
Less than 4,000 45 Less than 8,000 60 

Draw a ‘more than’ type ogive for these data. 
Solution: 

Convert the less than cumulative frequency distribution into simple 
frequency distribution and then into more than cumulative frequency 
distribution. Plot the lower limits and the more than cumulative 
frequencies on the graph paper and join these points by straight line. 
This is the required ogive. 


Class (000) Frequency Cumulative frequency (More then) 


Point 
O-—1 5 60 (0, 60) 
1-2 20 —-5 = 15 55 (1, 55) 
2-3 35 — 20 = 15 40 (2, 40) 
3—4 45 — 35 = 10 25 (3, 25) 
4—5 50 —-45 = 5 15 (4, 15) 
5-6 52 — 50 = 2 10 (5, 10) 
6-7 58 — 52 = 6 8 (6, 8) 


7-8 60 - 58 =22(7, 2) 


a - X- axis %@ cm = 1000 Units 
‘ie Scale Be ed = 

=) Y- axis % 

c 50+ 

5 


Total 60 


c4aeae #7 
Lower limit (O00) 
Illustration 19. 
Prepare cumulative frequency polygon (or curve) : 


Marks 0—30 30—40 40—50 50—60 60—70 70—100 


Frequency 10 15 30 3285 
Solution: 

Here the less than or more than type cumulative frequency curve 
is no directed so we will construct less than type cumulative 
frequency curve : 


Marks Frequency Cumulative Frequency (Less then) 
O0O—30 10 10 
30—40 15 25 
40—50 30 55 
50—60 32 87 
60—70 8 95 
70—80 5 100 


Points to be plotted are : (30, 10), (40, 25), (50, 55), (60, 87), 
(70, 95), (100, 100) 


inrulative frequency 
n 7 ~ Do 
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Less than Type Cumulative Frequency Curve 


Illustration 20. 
Represent the following data by more than cumulative frequency polygon : 


Class 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35—40 


Frequency 46 10 10 25 22 185 
Solution: 


Class Frequency Cumulative frequency (More than) 
O—5 4 100 

5—10 6 96 

10—15 10 90 

15—20 10 80 

20—25 25 70 

25—30 22 45 

30—35 18 23 

35—40 55 


The points to be plotted : (0, 100), (5, 96), (10, 90), (15, 80), (20, 
70), (25, 45), (30, 23), (35, 5) 


More than type Currulative 


Ol Frequency Curve 
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DETERMINATION OF MEDIAN BY GRAPHIC 


To find the median value, N /2 is marked on ordinate ( Y- axis). 
From that median value along Y -axis draw a straight line parallel to 
abscissa ( X -axis). From that point where this line meets the 
cumulative frequency curve, draw a perpendicular on abscissa. The 
point where it meets abscissa, is the median value. 


Illustration 21. 
Prepare both types of ogive from the following data and find median : 


Weight (in kg) 30-34 35-39 40-44 45-49 50-54 55-59 60-64 


Frequency 351218 1462 
Solution: 
Cumulative Frequency 


Weight Class Frequency Less than More than 
30—34 29.5—34.5 3 3 60 

35—39 34.5—39.5 5 8 57 

40—44 39.5—44.5 12 20 52 

45—49 44.5—49.5 18 38 40 

50—54 49.5—54.5 14 52 22 

55—59 54.5—59.5 6 58 8 

60—64 59.5—64.5 2 60 2 


Less than type cumulative 
frequency curve 


More than type cumulative 
frequency curve 


Curnulatye Frequency 


Corresponding to the cumulative frequency 2~»~” , the median is 
= 45.5 


Illustration 22. 
The marks obtained in Statistics by 100 B. Com. students of an University are 
given below : 


Marks 0—5 5—10 10—15 15—20 20—25 25—30 30—35 35—40 


No. of Students 46 10 10 25 22 185 

Draw a cumulative frequency curve and read off the values of Median and both 
the Quartiles. 
Solution : 

First simple frequency distribution will be converted into 
cumulative frequency curve in the following way : 


Marks less than 5 10 15 20 25 30 35 40 


Cumulating frequency 4 10 20 30 55 77 95 100 
Less than Type Cumulative Frequency Polygon or Ogive Curve 
Cumulative frequency curve will be constructed as follows : 


= 5 mars 
= 10 Cumulative fequeng 


Se 


N 100 2N _N_ 10 
4 A 


a a Sal 
Locate Q 4 ,Q 9, Q 3 by drawing proper lines. 

First Quartile, Q 4 = q 4 thitem = 17.5 marks 

Second Quartile (Median), Q 9 = q 9 thitem = 24 marks 
Third Quartile, Q 3 = q 3 thitem = 29.5 marks 


IHlustration 23. 


Draw less than type ogive for the following and locate quartiles, 
seventh decile and fifty-fifth percentile : 


Mid-value : 32 37 42 47 52 57 62 


Frequency : 6 10 24 36 28 124 
Solution : 


Steps : (i) First, construct less than type ogive curve. 
(ii) Draw perpendicular on the curve from = for Q 4 . From the 
point where perpendicular meets the curve, draw a perpendicular on 


X -axis. The points, where perpendicular meets the X- axis, decides 
the Q 4. 


(iii) Similarly locateQ 9,Q3,D7andPo55. 


Weight (kg) Class Frequency Cumulative Frequency 

32 29.5—34.566 

37 34.5—39.5 10 16 

42 39.5—44.5 24 40 

47 44.5—49.5 36 76 

52 49.5—54.5 28 104 

57 54.5—59.5 12 116 

62 59.5—64.5 4 120 


Total 120 


Points to be plotted : 
(34.5, 6), (39.5, 16), (44.5, 40), (49.5, 76), (54.5, 104), (59.5, 116), 
(64.5, 120) 


q 4 = 30, q 2 = 60, g 3 =90,d 7 = 84, p 5 = 66, 
Q 4 =43Q 9 =48Q 3 =92D7=50 P55 =49 


170 Less than type Curnulative 


Cumulative Frequency 2 


45 395 45 495 45 5 45 
Upper lirnit 


Illustration 24. 
Draw a cumulative polygon (or ogive) from the following data : 


Daily Wages (in rupees) No. of Workers 


40—44 46 


45—49 20 

Total 200 

With the help of this graph answer the following : 

(a) Percentage of workers who get : (i) more than * 42 (ii) between 
~ 31 and 48, and (iii) less than * 36. 

(b) Suppose A is one of workers. These are one-third workers who 


get less than him. Find A 's wages. 


Solution : 
Write class-intervals into exclusive form and find cumulative 


frequency less than type : 
Wages (less than) 24.5 29.5 34.5 39.5 44.5 49.5 


Cumulative frequency 12 32 72 134 180 200 
Construct less than cumulative frequency polygon. 


ao ——— 
[ | ji ou 
4 


195.245 #95 MS 395 445 495 


Daily wage (Upper lirrit) 
Obtain the answers of the relevant questions in the following way : 
(a) (i) For finding the number of workers who get more than ° 42, 

take the point 42 on abscissa and draw a line parallel to Y -axis from 

that point. From the point where this line meets the polygon, draw a 

line parallel to X -axis. Read the point where this line touches the Y- 


axis. Thus, 
No. of workers who get less than * 42 = 152 
-. No. of workers who get more than * 42 = 200 — 152 = 48 


So, Required percentage = 0 X 100 = 24 

(ii) Similarly, 

The number of workers who get less than * 31 = 44 

The number of workers who get less than * 48 = 192 

-. The number of workers who get between ~ 31 and ° 48 = 192 — 
44 = 148 

So, Required percentage = 20 X 100 = 74 

(iii) The number of workers who get less than * 36 = 92 

So, Required percentage = 20 X 100 = 46 

(b) To find the daily wages of A , find a point = ;-s = 66.7 = 67 
on the Y -axis. 

From that point draw a line parallel to X -axis. From the point 
where this line touches the polygon, draw a perpendicular on X -axis 
and find the touch point. This is the required daily wages of worker 
A. 

Exercise 14(C) 
Histogram 
1. Construct a histogram from the following data : 
Weekly Wages ( ~° ) 10-15 15-20 20-25 25-30 30-35 35-40 40-45 
No. of Wage Earners 5 15 20 16 14 106 
2. Construct a histogram from the data given below : 
Class 50-60 60—70 70-80 80-90 90-100 100-110 110-120 
Frequency (f )81016141052 
3. Prepare a histogram from the following distribution : 
Quantity of Milk (in litre) 4-6 6-8 8-10 10-12 12-14 14-16 
No. of Cows 12 38 23 1542 


4. Draw on histogram from the following data : 


Wages less than © 24.5 29.5 34.5 39.5 44.5 49.5 

Frequency 12 32 72 134 180 200 

5. Draw a histogram for the following frequency distribution regarding the life of 
electric lamps (in hours) : 

Mid-value 1010 1030 1050 1070 1090 1110 

No. of Lamps 10 137 482 360 140 18 

6. From the following data draw a histogram on graph paper : 

Wages (in rupees) 50—55 55-60 60-65 65-70 70-80 80-100 

No. of Workers 10 18 40 25 32 24 

[ Note : Due to unequal width of C.I., the heights of rectangles will be : 2, 3.6, 8, 
5,322; 1:3] 

7. Construct a Histogram from the following data : 

Weekly Wages ( ° ) 10—15 15—20 20-25 25-30 30-40 40-60 60-80 

No. of Wage Earners 7 19 27 15 12 21 8 


8. Draw a histogram to represent the following frequency distribution : 
Profit per Shop 0-50 50-100 100-200 200-300 300-500 

No. of Shops 120 180 54 600 170 

9. Construct a histogram to represent the following frequency distribution : 
Size 15-25 25-35 35-50 50-65 65-85 85-95 

Frequency ( f ) 15 36 60 45 30 24 

10. Represent the following data by a histogram : 

Class 70-80 80-90 90-100 100-120 120-140 140-150 


Frequency ( f ) 12 16 35120 100 24 
Mode from the Histogram 


11. From the following figures construct a histogram and determine the value of 


mode : 


Class-interval 0-10 10—20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 

Frequency 3 7 13 22 3514864 

12. From the following figures construct a histogram and determine the value of 
mode : 

Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 

Frequency ( f )48 14 20 30 156 

13. Represent the following data on graph by histogram and find out mode : 

Marks 0-20 20-40 40-60 60-80 80-100 

No. of Students 25 48 75 60 15 

14. Constructing a Histogram for the following data, find mode : 

Marks 0-9 10-19 20-29 30-39 40-49 50-59 60-69 

Student 2 6 10 15 20 12 4 

15. Represent the following data in the form of a histogram and locate the value of 
mode : 

Mid-Age (in year) 20 25 30 35 40 45 50 

Frequency 4 18 20 15 124 2 

16. Represent the following distribution by a histogram and find mode : 

Marks less than 10 20 30 40 50 60 70 80 90 

No. of Students 7 19 37 61 93 121 143 158 179 

17. Represent the following frequency distribution by histogram and locate the 
mode : 

Age more than (in years) 15 20 25 30 35 40 45 


No. of Persons 400 350 289 197 122 62 22 
Cumulative Frequency Curve/Ogive Curve 
18. The marks obtained by 100 students in an examination are given below. Draw 


more than type ogive curve : 


Marks 0-5 5—10 10-15 15-20 20—25 25-30 30-35 35—40 


No. of Students 4 6 10 10 25 22185 

19. Draw less than type ogive curve from the following data : 

Class 10-25 25-40 40-55 55-70 70-85 85-100 

Frequency (f)71216953 

20. Draw the both ogives from the following data : 

Class 20-40 40-60 60-80 80-100 100-120 120-140 140-160 

Frequency (f )4610161273 

21. Draw ogives (both less than and more than types) from the following 
frequency distribution : 

Working days ina year 200-219 220-239 240-259 260-279 280-299 


No. of Colleges 10 18 22 157 
Median and Partition values by Graph 


22. Draw an ogive from the following data and from it find the median and 
quartiles : 

Class 0-20 20-30 30-40 40-50 50-60 60-70 70-90 

Frequency ( 7 ) 21 19 60 42 24 1817 

[ Ans.: Q 4 = 32, Q 9 =40, Q 3 =50] 

23. Draw an ogive for the following frequency distribution. Read the median from 
the graph : 

Daily Wages (in © ) 50-55 55-60 60-65 65—70 70—75 75-80 80-100 

No. of Workers : 6 10 22 30 16 12 15 

24. Construct an ogive curve and find the value of median and both quartiles : 

Wage (~° ) 10-15 15—20 20-25 25-30 30-35 35-40 40-45 

Wage earners 5 8 15 20 16 106 

25. Draw a cumulative frequency curve for the following data and locate median 


and quartiles on it: 


Class Freq. ( f ) 
1-5 7 

6-10 10 

11-15 16 

16—20 32 

21-25 24 

26-30 18 

31-35 10 

36-40 5 

41-45 1 


26. From the data given below draw an ogive and find the values of median and 
quartiles from the graph drawn : 

Mid-value 10 20 30 40 50 60 

Frequency 15 32 51 71 97 100 


27. Construct a cumulative frequency polygon and locate graphically median and 


quartiles : 
Class Frequency 
0-106 
10-20 8 
20-30 12 
30-40 19 
40-50 11 
50-60 15 
60-70 12 
70-80 7 
80-90 6 
90-100 4 


28. Using the following table, draw ‘less than’ ogive and determine the values of 
the median and two quartiles : 

Class : Freq. ( f ) 

0-7 1 

7-14 4 


14-21 8 
21-28 14 


28-35 17 
35-42 13 
42-49 10 
49-56 7 


29. Given below are the monthly expenditures of a hostel in a college : 


Monthly Exp. (00, rupees) No. of Students 
10-12 6 

12-14 9 

14-16 15 

16-18 30 

18-20 12 

20-22 6 

22-24 3 


30. Draw less than cumulative frequency polygon for the following data and locate 
median, upper quartile, 7th decile and 60th percentile : 


Class Frequency ( f ) 
20-30 3 

30-40 9 

40-50 33 

50-60 63 

60-70 129 

70-80 96 

80-90 27 


THEORETICAL QUESTIONS 


Long Answer Type Questions 

1. Show clearly the necessity and importance of diagrams is Statistics. (Meerut 
2000) 

2. What precautions should be taken in drawing a good diagram ? (Meerut 2000) 


3. Write short notes on the following : 
(i) Bar and pie diagrams 


(ii) Duo-directional bar diagram 
(iii) Percentage subdivided diagram 
(iv) Two dimensional diagram 


4. Explain the utility of representing data by diagrams and discuss the method of 
forming a divided circular diagram. 

5. “Diagrams helps us to visualize the whole meaning of numerical complex at a 
simple glance.” Comment. 

6. What are the merits and limitations of diagrammatic representation of statistical 
data ? 


7. Explain the following with examples : 
(i) Pie Diagram 
(ii) False Base Line and its utility 


8. Write short notes on the following : 

(i) Merits of diagrammatic representation 

(ii) Usefulness of diagrammatic representation 
(iii) Limitations of diagrammatic representation 


9. Discuss in detail the different modes of graphical representation of frequency 
distributions. 

10. What do you understand by a histogram ? Explain with the help of an example 
how it is constructed in case of a continuous variable unequal class-intervals ? 

11. How could you locate mode, median, quartiles, deciles and percentiles on 


graph paper ? 


OBJECTIVE QUESTIONS 
Choose the correct answers 


1. Which of the following can not be calculated by graphical method ? 
(a) Mean (b) Median 
(c) Mode (d) Quartile 


2. The other name of ogive curve is : 

(a) Frequency histogram (b) Frequency Polygon 

c) Cumulative frequency curve (d) None of these 

3. In the construction of histogram the importance is given to : 
(a) Thickness (b) Height 

(c) Breadth (d) None of these 


4. The point of intersection of the two cumulative frequency curves provides : 
(a) Mean (b) Height 
(c) Breadth (d) None of these 


Diagrammatic and Graphical Representation of Data | 


APPENDIX 


LOG, ANTILOG, RECIPROCAL TABLES AND 
THEIR USE 


LOGARITHM 


At the time of solving questions, we have to face some huge 
division, multiplication etc. which take more time to solve and despite 
it we feel difficulties and give extra efforts. It is still not sure that 
calculation is correct or not. We get much help to remove this 
difficulty by the use of logarithm. For the simplicity, we take base as 
10 in the problems related to mathematics. 


Definition : Let a , x and N be three numbers such that a * = N ; 
then a is here known as base and x is called logarithm of N at the 
base a. In other words logarithm of given number to a certain base 
is the power to which that base must be raised in order to obtain the 
given number. 

x=loggN 

We read it as x = logarithm of N to the base a. 

254 =5x5x5x5 = 625 

=> log 5 625=4 


It means logarithm of 625 to the base 5 is 4. Similarly, 10 4 = 
10,000. It means logarithm of 10,000 to the base 10 is 4. 
= log 49 10,000 = 4 


In mathematical problems where base is not clear, we can assume 
base as 10. 


COMMON LOGARITHM 


Description of the method is as follows : 
10 | = 10,0000 -. log 10 = 1.0000 


10 1/2 = 3.16227 -. log 3.16227 = 0.50000 
10 1/4 = 4.77827 -. log 1.77827 = 0.25000 
10 1/8 = 4.33352 -. log 1.33352 = 0.125000 


10 1/16 = 4.45478 -. log 1.15478 = 0.06250 


In the left hand side each number is the square root of its 
preceding number while in the right hand side each number is just 
half of its preceding number. 

To determine the characteristic of a number more than 1 : 


109 =1 - log 1=0 

10 1=10- log 10=1 

10 2 = 100 -. log 100 =2 

10 3 = 1000 -. log 1000 = 3 etc. 

It means that the number whose integral part has one digit, will lie 


between 10 0 


and 10 and its logarithm will lie between 0 and 1. So 
there will be only decimal points in its logarithm. The characteristic of 
its logarithm will be zero. 

The number whose integral part has two digits will lie between 10 
and 10 2 and its logarithm will lie between 1 and 2. So its logarithm 
will be of the form 1 + decimal part and its characteristic will be 1. 

In the same way, the number whose integral part has n digits will 
lie between 10 ” = and 10 ” and its logarithm will be ( n — 1) + 
decimal part and its characteristic will be ( n — 1). 

Characteristic : The integral part of logarithm is known as 
characteristic. 

Mantissa : The decimal part of logarithm is known as mantissa. 

Characteristic : If any given number is more than 1, ( n — 1) will 
be used by subtracting 1 from its integral part, where n is the 


number of digits in the integral part. For example, 


S. No. Number No. of digits in Subtracting 1 Charac- 
the integral part teristic 

(i) 34567 55-1=44 

(ii) 3456.744-1=33 

(iii) 345.67 33-1=22 

(iv) 34.567 22-—1=11 

(v) 3.4567 11-—1=00 


In example (i) total number of digits is 5 so character will be 5 — 1 = 
4. In example 


(ii) number of digits = 4 then character will be 4 — 1 = 3. 
Consider the number less than 1 which are integral power of 10: 


10~° =10 . log 49 O=- 
10° "20 slog qo01e=1 

10 ~2 =0.01 - log 49 0.01 =-2 
10 ~% = 0.001 . log 49 0.001 =-3 


The characteristic of those numbers which are less than 1 will be 
always negative. To get them, formula ( n + 1) will be used. Here n 
is the number of zeros immediately after the decimal point and just 
before the first digit. 

S. No. Number No. of zeros Adding 1 Character- 

before decimal point istic 

(i) .3456700+1=1-10r: 

(ii) 03456711+1=2-2or: 

(iii) 0034567 22+1=3-30or: 

(iv) 00034567 33+1=4-4or: 

In example (i) there is no zero after the decimal point, so n = 0. 
Similarly, in example (ii), (iii) and (iv) there are 1, 2 and 3 zeros 
respectively so n will be 1, 2 and 3. 


Mantissa : This is the decimal part of the logarithm. To known it 
we use the table given at the last of the book. Horizontal and vertical 
divisions are formed in this table. There are 0 to 9 divisions in 
horizontal rows and in its right side there is a division having 0 to 9 
small divisions (mean difference). First part having 0 to 9 divisions is 
for the third digit of a number and mean difference is for the fourth 
digit of the number. In vertical column, there are numbers from 10 to 
99. Mantissa is always positive. Characteristic is determined only by 
inspection of number of digits in the figure. If the number is more 
than 1 then characteristic will be 1 less than 
the number of digits before the decimal point and if the number is 
less than 1 then 
characteristic will be negative and it will be one more than the 


number of zeros just after the decimal point. 
The following points should be noted about the mantissa : 
(1) Mantissa is always positive. 


(2) The position of decimal point has no effect on it i.e. , mantissa 
is the same for a number, if digits remain in the same order but 
decimal point varies. For example, the mantissa for the numbers 


3456, 345.6, 34.56 will be the same. 

(3) In order to see mantissa from the logarithm table, given number 
is reduced to 4 digits by approximation after removing the decimal 
point. 

(4) The first two digits of the approximate number are seen in the 
first vertical column of the logarithm table in horizontal form and for 
the third digit corresponding horizontal column will be the mantissa 
of the given number. The quantity appearing under the ‘mean 
difference’ column is added to the above mantissa to adjust for the 
fourth digit and thus complete mantissa of the number is determined. 


Find Characteristic and Mantissa 


Example : Suppose, we have to find the characteristic and 
mantissa of 3456. 

First we will Know the characteristic. From the above discussed 
formula its characteristic will be ( n — 1) = (4 — 1) = 3. When 
characteristic is determined then mantissa is required. We see the 
vertical column of the left side for first two digits and for third digit we 
see the horizontal row for 0 to 9 and then for fourth digit we see the 
mean difference. 

To find the logarithm of 3456 we see 34 in the vertical column and 
see horizontal row against 34 for third digit 5. We find there 5378. 
Now we see the mean difference for fourth digit 6. We find 8 in mean 
difference column against 34. We add it in the first mantissa /.e., 
5378 + 8 = 5386 will be required mantissa. Writing characteristic first 
we put decimal point and then write mantissa, that is 

Number (17) log n 
3456 3 + .5386 = 3.5386 
345.6 2 + .5386 = 2.5386 


34.56 1 + .5386 = 1.5386 

3.456 0 + .5386 = 0.5386 

3456 1 + 5386 = 15386 

.03456 3 + 5386 = 2.5386 

Thus, characteristic and mantissa are written and one point should 

be noted that characteristic may be negative but mantissa may not 
be negative and mantisa is unaffected by the position of decimal 
point in the number. That is mantissa for the numbers having same 
digits less than 1 will be the same irrespective of their values. 


ANTILOGARITHM 


The antilogarithm of a number is that number whose logarithm is 
the given number. The antilogarithm table for it is given at the end of 


the book. In it the first vertical column contains the number ranging 
from .00 to .99 and horizontal rows are as before. At the time of 
finding antilog from this table we do not see characteristic but see 
the mantissa. At this time it is essential to approximate the logarithm 
mantissa upto four digits. Two digits after the decimal point are seen 
in vertical column, third digit is seen in the horizontal row of that 
figure and fourth digit is seen from the mean difference. After this 
work, we take ( n + 1) by adding 1 to characteristic ( n ) of the 
logarithm for the number of digits in the whole part of the desired 
number. If characteristic is negative we use the formula ( n — 1) to 
place the decimal. Here n is the characteristic. 


Example : Antilog 3.5386 = 3456 

Antilog 2.5386 = 345.6 

Antilog .5386 = 3.456 

Now if characteristic of logarithm is negative then after seeing the 
mantissa of antilogarithm, the number of zeros just after the decimal 
point will be one less than the characteristic. 


Example : Antilog 1»»_= 0.4291 

Antilog +:«= 0.04291 

Antilog »»._= 0.004291 

Antilog + = 0.0004291 

SOME USEFUL FORMULAE OF LOGARITHM 

(i) log ( mx n)=log m+ log ni.e., mx n= Antilog (log m + log n 
), Multiplication, 

(ii) log (m/n )=log m-— log ni.e., m/n = Antilog (log m— log n 
), Division, 

(iii) log m Oe log Mi.e., m te Antilog ( n log m ), Raising to 
power, 


(iv) log = =log m 1/2 = log Mi.e., % = antilog ( : log m ), Root. 
See one example in which characteristic, mantissa and 
antilogarithm and formulae are used. 


(a) Solve 


15 
1 


97 


‘“s = Antilog (log 15 + 7 log 2 — log 60) 
= Antilog {1.1761+7(0.3010)-1.7782} 

= Antilog (1.1761 + 2.1070 — 1.7782) 

= Antilog (1.5049) 

= 31.99 = 32 (approximate) 


(b) Solve tw. 

Solution : 100% = Antilog [log 45 + log 20 — 4 log 10] 
= Antilog [1.6532 x 1.3010 — 4.000] 

= Antilog [2.9542 — 4.0000] 


= Antilog 25#1 

= 0.08999 - 0.09 

In this example (2.9542 — 4.0000) will be negative. In such type of 
question, when the number having negative sign is larger then they 


Solution : 


will be subtracted mutually. Thus, characteristic will be negative. Put 
(—) sign above the characteristic, i.e. , 


2.9542 — 4.0000 = 2.9542 
4.0000 


Now we write 22 Remaining calculation will be the same as 
before. 


USE OF RECIPROCAL 


Table of reciprocals will be used to find the reciprocal of the 
numbers. In this table, the reciprocals of the numbers written in the 
first vertical column from 1.0 to 9.9 are given against them. The 
method to see the table is the same as the method of logarithm. The 


difference is only that in the reciprocal table, the mean difference 
which is written for the fourth digit against first two digits is 
subtracted from the reciprocal. Following are the rules to place the 
decimal point : 

(1) If the number is more than 1 then the number of zeros after the 
decimal point will be 1 less than the number of digits. 

(2) If the number is less than 1 then count the number of zeros 
after decimal point and add 1 in it and place decimal point in the 
reciprocal after the same number of digits. 

It will be clear from the following table : 

For example, we have to find the reciprocal of 345. We will go to 
the 5th column against 34 of the first column of left side. 2899 is 
written there. Now count the number of digit of figure and decrease it 
by 1. Place the decimal point before the same number of Zeros in left 
side. It is our required reciprocal. 

That is there are 3 digits in 345. So the number 1 less than 3 is 2. 
Take two zeros in the left side of 2899 and then place decimal point. 
It means reciprocal of 345 will be .002899. 

In the same way, suppose we have to find the reciprocal of 34.5. In 
the same way as before we will go to the 5th column against 3.4 
where we will find 2899. Now the numbers of integers in the number 
34.5 is 2. So one zero will be taken before 2899 and then place 
decimal point. That is the reciprocal of 34.5 will be 0.02899. 

It will be cleared from the following table : 


S.No. Number The No. given in Reciprocal read Reciprocal of the 


the table from the table number after adjusting 
zero and decimal point 

(i) 345 3.45 .2899 .002899 

(ii) 34.5 3.45 .2899 .02899 

(iii) 3.45 3.45 .2899 .2899 

(iv) 0.345 3.45 .2899 2.899 

(v) 0.0345 3.45 .2899 28.99 

(vi) 0.0034 3.40 .2941 294.1 

vy 0.6394 6.394 .1565 1.564 


mee 


In example (iv) and (v), there is no integer in the numbers, i.e. , the 
number is less than 1. In this situation, count the number of zeros 
after the decimal point and add 1 in that number. We place decimal 
point after the same number of digits, i.e. , there is no zero after the 
decimal point in .345. Adding 1 we have 1. We place decimal point 
after one digit from the left in the number 2899 obtained from the 
table. Thus, we have reciprocal = 2.899. Similarly in example (v), the 
number is .0345. Here number of zeros after decimal point is 1. 
Adding 1 in it we get 2. So we place decimal point after two digits 
from the left in the number 2899 of table i.e. , reciprocal will be 


28.99. 
For getting reciprocal of number having four digits, we subtract the 
number written in mean difference column from the reciprocal. 


Appendix | 


